python - must build Spark with Hive (spark 1.5.0) -
downloaded spark 1.5.0 pre-built , run via pyspark simple code
from pyspark.sql import row l = [('alice', 1)] sqlcontext.createdataframe(l).collect
yields error:
15/09/30 06:48:48 info datastore: class "org.apache.hadoop.hive.metastore.model.mresourceuri" tagged "embedded-only" es not have own datastore table. traceback (most recent call last): file "<stdin>", line 1, in <module> file "c:\bigdata\spark-1.5\spark-1.5.0\python\pyspark\sql\context.py", line 408, in createdataframe jdf = self._ssql_ctx.applyschematopythonrdd(jrdd.rdd(), schema.json()) file "c:\bigdata\spark-1.5\spark-1.5.0\python\pyspark\sql\context.py", line 660, in _ssql_ctx "build/sbt assembly", e) exception: ("you must build spark hive. export 'spark_hive=true' , run build/sbt assembly", py4jjavaerror(u'an error occurred while calling none.org.apache.spark.sql.hive.hivecontext.\n', javaobject id=o28))
so tried compile myself
c:\bigdata\spark-1.5\spark-1.5.0>.\build\apache-maven-3.3.3\bin\mvn -phadoop-2.4 -dhadoop.version=2.4.0 -dskiptests -phive -phive-t
hriftserver clean package
but still same error on compiled version.
any suggestion?
add these line after importing row
from pyspark import sparkcontext pyspark.sql import sqlcontext sc = sparkcontext( 'local', 'pyspark') sqlcontext = sqlcontext(sc)
Comments
Post a Comment