pyspark - parsing JSON messages with Spark Streaming in Python -
i have dstream of json messages of form {"userid": "xxxx", "count": 000}. want figure out best way parse can create data frame.
what's difference between 1 , 2 in case:
parsed = kafkastream.map(lambda x: json.loads(x))parsed = kafkastream.map(lambda x: json.loads(x[1])
this kafkastream specific question. receiving pair rdd kafka dsstream. pair rdd 2 elements tuple (key, value). why have pick 2nd element retrieve value. write
parsed = kafkastream.map(lambda (key, value): json.loads(value)) in python it's recommended use _ unused variable, in case i'd use key remind me lambda receiving pair rdd.
Comments
Post a Comment