r - In SparkR 1.5.0, how do we specify a column unambiguously after a join on common column? -
i joined 2 dataframes on column same name.
oe = join(orders, emp, orders$employeeid == emp$employeeid)
the resulting dataframe has 2 columns same name employeeid
now group or printing column name
peremp = groupby(oe, 'employeeid', sales = n(oe$orderid)) oe$employeeid
fails error
error in invokejava(isstatic = false, objid$id, methodname, ...) :
org.apache.spark.sql.analysisexception: reference 'employeeid' ambiguous, be: employeeid#36, employeeid#69.;
you can access columns through parent data frame. first lets create example data:
df1 <- createdataframe(sqlcontext, data.frame(id=c(1, 2, 3), v=c("a", "b", "c"))) df2 <- createdataframe(sqlcontext, data.frame(id=c(2, 3), v=c("g", "z"))) df <- join(df1, df2, df1$id == df2$id) head(df) ## id v id v ## 1 3 c 3 z ## 2 2 b 2 g
and access v
column:
select(df, "v") ## 15/09/30 17:47:13 error rbackendhandler: select on 131 failed ## error in invokejava(isstatic = false, objid$id, methodname, ...) : ## org.apache.spark.sql.analysisexception: reference 'v' ambiguous, ## .... select(df, df1$v) %>% head ## v ## 1 c ## 2 b
Comments
Post a Comment