r - In SparkR 1.5.0, how do we specify a column unambiguously after a join on common column? -


i joined 2 dataframes on column same name.

oe = join(orders, emp, orders$employeeid == emp$employeeid) 

the resulting dataframe has 2 columns same name employeeid

now group or printing column name

peremp = groupby(oe, 'employeeid', sales = n(oe$orderid)) oe$employeeid 

fails error

error in invokejava(isstatic = false, objid$id, methodname, ...) :
org.apache.spark.sql.analysisexception: reference 'employeeid' ambiguous, be: employeeid#36, employeeid#69.;

you can access columns through parent data frame. first lets create example data:

df1 <- createdataframe(sqlcontext, data.frame(id=c(1, 2, 3), v=c("a", "b", "c"))) df2 <- createdataframe(sqlcontext, data.frame(id=c(2, 3), v=c("g", "z"))) df <- join(df1, df2, df1$id == df2$id) head(df) ##   id v id v ## 1  3 c  3 z ## 2  2 b  2 g 

and access v column:

select(df, "v") ## 15/09/30 17:47:13 error rbackendhandler: select on 131 failed ## error in invokejava(isstatic = false, objid$id, methodname, ...) :  ##   org.apache.spark.sql.analysisexception: reference 'v' ambiguous, ## ....  select(df, df1$v) %>% head ##   v ## 1 c ## 2 b 

Comments

Popular posts from this blog

html - Outlook 2010 Anchor (url/address/link) -

javascript - Why does running this loop 9 times take 100x longer than running it 8 times? -

Getting gateway time-out Rails app with Nginx + Puma running on Digital Ocean -