pyspark - parsing JSON messages with Spark Streaming in Python -


i have dstream of json messages of form {"userid": "xxxx", "count": 000}. want figure out best way parse can create data frame.

what's difference between 1 , 2 in case:

  1. parsed = kafkastream.map(lambda x: json.loads(x))
  2. parsed = kafkastream.map(lambda x: json.loads(x[1])

this kafkastream specific question. receiving pair rdd kafka dsstream. pair rdd 2 elements tuple (key, value). why have pick 2nd element retrieve value. write

parsed = kafkastream.map(lambda (key, value): json.loads(value)) 

in python it's recommended use _ unused variable, in case i'd use key remind me lambda receiving pair rdd.


Comments

Popular posts from this blog

1111. appearing after print sequence - php -

java - WARN : org.springframework.web.servlet.PageNotFound - No mapping found for HTTP request with URI [/board/] in DispatcherServlet with name 'appServlet' -

Ruby on Rails, ActiveRecord, Postgres, UTF-8 and ASCII-8BIT encodings -