scala - Spark + Kafka integration - mapping of Kafka partitions to RDD partitions -
i have couple of basic questions related spark streaming
[please let me know if these questions have been answered in other posts - couldn't find any]:
(i) in spark streaming, number of partitions in rdd default equal number of workers?
(ii) in direct approach spark-kafka integration, number of rdd partitions created equal number of kafka partitions. right assume each rdd partition i mapped same worker node j in every batch of dstream? ie, mapping of partition worker node based solely on index of partition? example, partition 2 assigned worker 1 in 1 batch , worker 3 in another?
thanks in advance
i) default parallelism number of cores (or 8 mesos), number of partitions input stream implementation
ii) no, mapping of partition indexes worker nodes not deterministic. if you're running kafka on same nodes spark executors, preferred location run task on node of kafka leader partition. then, task may scheduled on node.
Comments
Post a Comment