pyspark - parsing JSON messages with Spark Streaming in Python -

pyspark - parsing JSON messages with Spark Streaming in Python -

i have dstream of json messages of form {"userid": "xxxx", "count": 000}. want figure out best way parse can create data frame.

what's difference between 1 , 2 in case:

parsed = kafkastream.map(lambda x: json.loads(x))
parsed = kafkastream.map(lambda x: json.loads(x[1])

this kafkastream specific question. receiving pair rdd kafka dsstream. pair rdd 2 elements tuple (key, value). why have pick 2nd element retrieve value. write

parsed = kafkastream.map(lambda (key, value): json.loads(value))

in python it's recommended use _ unused variable, in case i'd use key remind me lambda receiving pair rdd.

Comments