scala - Spark + Kafka integration - mapping of Kafka partitions to RDD partitions -

March 15, 2011

i have couple of basic questions related spark streaming

[please let me know if these questions have been answered in other posts - couldn't find any]:

(i) in spark streaming, number of partitions in rdd default equal number of workers?

(ii) in direct approach spark-kafka integration, number of rdd partitions created equal number of kafka partitions. right assume each rdd partition i mapped same worker node j in every batch of dstream? ie, mapping of partition worker node based solely on index of partition? example, partition 2 assigned worker 1 in 1 batch , worker 3 in another?

thanks in advance

i) default parallelism number of cores (or 8 mesos), number of partitions input stream implementation

ii) no, mapping of partition indexes worker nodes not deterministic. if you're running kafka on same nodes spark executors, preferred location run task on node of kafka leader partition. then, task may scheduled on node.

Search This Blog

TSQL

scala - Spark + Kafka integration - mapping of Kafka partitions to RDD partitions -

Comments

Post a Comment

Popular posts from this blog

1111. appearing after print sequence - php -

node.js - Express and Redis - If session exists for this user, don't allow access -

excel - I can't get the attachement of the email PHP -