SQLITE_ERROR: Connection is closed when connecting from Spark via JDBC to SQLite database -


i using apache spark 1.5.1 , trying connect local sqlite database named clinton.db. creating data frame table of database works fine when operations on created object, error below says "sql error or missing database (connection closed)". funny thing result of operation nevertheless. idea can solve problem, i.e., avoid error?

start command spark-shell:

../spark/bin/spark-shell --master local[8] --jars ../libraries/sqlite-jdbc-3.8.11.1.jar --classpath ../libraries/sqlite-jdbc-3.8.11.1.jar 

reading database:

val emails = sqlcontext.read.format("jdbc").options(map("url" -> "jdbc:sqlite:../data/clinton.sqlite", "dbtable" -> "emails")).load() 

simple count (fails):

emails.count 

error:

15/09/30 09:06:39 warn jdbcrdd: exception closing statement java.sql.sqlexception: [sqlite_error] sql error or missing database (connection closed) @ org.sqlite.core.db.newsqlexception(db.java:890) @ org.sqlite.core.corestatement.internalclose(corestatement.java:109) @ org.sqlite.jdbc3.jdbc3statement.close(jdbc3statement.java:35) @ org.apache.spark.sql.execution.datasources.jdbc.jdbcrdd$$anon$1.org$apache$spark$sql$execution$datasources$jdbc$jdbcrdd$$anon$$close(jdbcrdd.scala:454) @ org.apache.spark.sql.execution.datasources.jdbc.jdbcrdd$$anon$1$$anonfun$8.apply(jdbcrdd.scala:358) @ org.apache.spark.sql.execution.datasources.jdbc.jdbcrdd$$anon$1$$anonfun$8.apply(jdbcrdd.scala:358) @ org.apache.spark.taskcontextimpl$$anon$1.ontaskcompletion(taskcontextimpl.scala:60) @ org.apache.spark.taskcontextimpl$$anonfun$marktaskcompleted$1.apply(taskcontextimpl.scala:79) @ org.apache.spark.taskcontextimpl$$anonfun$marktaskcompleted$1.apply(taskcontextimpl.scala:77) @ scala.collection.mutable.resizablearray$class.foreach(resizablearray.scala:59) @ scala.collection.mutable.arraybuffer.foreach(arraybuffer.scala:47) @ org.apache.spark.taskcontextimpl.marktaskcompleted(taskcontextimpl.scala:77) @ org.apache.spark.scheduler.task.run(task.scala:90) @ org.apache.spark.executor.executor$taskrunner.run(executor.scala:214) @ java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1142) @ java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor.java:617) @ java.lang.thread.run(thread.java:745) res1: long = 7945

i got same error today, , important line before exception:

15/11/30 12:13:02 info jdbc.jdbcrdd: closed connection

15/11/30 12:13:02 warn jdbc.jdbcrdd: exception closing statement java.sql.sqlexception: [sqlite_error] sql error or missing database (connection closed) @ org.sqlite.core.db.newsqlexception(db.java:890) @ org.sqlite.core.corestatement.internalclose(corestatement.java:109) @ org.sqlite.jdbc3.jdbc3statement.close(jdbc3statement.java:35) @ org.apache.spark.sql.execution.datasources.jdbc.jdbcrdd$$anon$1.org$apache$spark$sql$execution$datasources$jdbc$jdbcrdd$$anon$$close(jdbcrdd.scala:454)

so spark succeeded close jdbc connection, , fails close jdbc statement


looking @ source, close() called twice:

line 358 (org.apache.spark.sql.execution.datasources.jdbc.jdbcrdd, spark 1.5.1)

context.addtaskcompletionlistener{ context => close() } 

line 469

override def hasnext: boolean = {   if (!finished) {     if (!gotnext) {       nextvalue = getnext()       if (finished) {         close()       }       gotnext = true     }   }   !finished } 

if @ close() method (line 443)

def close() {   if (closed) return 

you can see checks variable closed, value never set true.

if see correctly, bug still in master. have filed bug report.


Comments

Popular posts from this blog

html - Outlook 2010 Anchor (url/address/link) -

javascript - Why does running this loop 9 times take 100x longer than running it 8 times? -

Getting gateway time-out Rails app with Nginx + Puma running on Digital Ocean -