r - How to load csv file into SparkR on RStudio? -
how load csv file sparkr on rstudio? below steps had perform run sparkr on rstudio. have used read.df read .csv not sure how else write this. not sure if step considered create rdds.
#set sys environment variables sys.setenv(spark_home = "c:/users/desktop/spark/spark-1.4.1-bin-hadoop2.6") .libpaths(c(file.path(sys.getenv("spark_home"), "r", "lib"), .libpaths())) #sys.setenv('sparkr_submit_args'='"--packages" "com.databricks:spark-csv_2.10:1.0.3" "sparkr-shell"') #load libraries library(sparkr) library(magrittr) sc <- sparkr.init(master="local") sc <- sparkr.init() sc <- sparkr.init(sparkpackages="com.databricks:spark-csv_2.11:1.0.3") sqlcontext <- sparkrsql.init(sc) data <- read.df(sqlcontext, "c:/users/desktop/datasets/hello_world.csv", "com.databricks.spark.csv", header="true") i getting error:
error in writejobj(con, object) : invalid jobj 1
spark 2.0.0+:
you can use csv data source:
loaddf(sqlcontext, path="some_path", source="csv", header="true") without loading spark-csv.
original answer:
as far can tell you're using wrong version of spark-csv. pre-built versions of spark using scala 2.10, you're using spark csv scala 2.11. try instead:
sc <- sparkr.init(sparkpackages="com.databricks:spark-csv_2.10:1.2.0")
Comments
Post a Comment