python - Naming Variables using Pyspark -

March 15, 2014

even though problem pretty simple, since i'm new spark having issues resolving it.

the normal python query issue following:

for line in file('schedule.txt'):   origin,dest,depart,arrive,price=line.split(',')

i read file as

sched=sc.textfile('/path/schedule.txt')

but when trying following code:

  origin,dest,depart,arrive,price=sched.split(',')

i'm getting error:

--------------------------------------------------------------------------- attributeerror                            traceback (most recent call last) <ipython-input-46-ba0e8c07ca89> in <module>() ----> 1 origin,dest,depart,arrive,price=sched.split(',')  attributeerror: 'rdd' object has no attribute 'split'

i split file using lambda function. don't know how create 5 variable names.

if can please me.

sched=sc.textfile('/path/schedule.txt') returns rdd different datatype python file object , supports different api. equivalent of python code like:

sched=sc.textfile('/path/schedule.txt') # extract values vals = sched.map(lambda line:line.split(',')) # can processing, example sum price price = vals.reduce(lambda v1,v2:v1[4]+v2[4]) # or collect raw values raw_vals = vals.collect()

update: if want able access values of each line local variables define dedicated function instead of lambda , pass .map():

def process_line(line):     origin,dest,depart,arrive,price=line.split(',')     # whatever     # remember return result  sche.map(process_line)

update2:

the specific processing want on file not trivial because requires writing shared variable (flights). instead, i'd suggest grouping lines orig,dest, collecting results , inserting dict:

flights_data = sched.map(lambda line: ((line[0],line[1]),tuple(line[2:]))).groupbykey().collect() flights = {f:ds f,ds in flights_data}

Search This Blog

TSQL

python - Naming Variables using Pyspark -

Comments

Post a Comment

Popular posts from this blog

java - WARN : org.springframework.web.servlet.PageNotFound - No mapping found for HTTP request with URI [/board/] in DispatcherServlet with name 'appServlet' -

android - How to create dynamically Fragment pager adapter -

1111. appearing after print sequence - php -