apache spark - BigQuery - How to set read timeout in the Java client library -


i using spark load data bigquery. idea read data s3 , use spark , bigquery client api load data. below code insert bigquery.

val bq = createauthorizedclientwithdefaultcredentialsfromstream(appname, credentialstream) val bqjob = bq.jobs().insert(pid, job, data).execute() // data inputstream content 

with approach, seeing lot of sockettimeoutexception.

caused by: java.net.sockettimeoutexception: read timed out @ java.net.socketinputstream.socketread0(native method) @ java.net.socketinputstream.socketread(socketinputstream.java:116) @ java.net.socketinputstream.read(socketinputstream.java:170) @ java.net.socketinputstream.read(socketinputstream.java:141) @ sun.security.ssl.inputrecord.readfully(inputrecord.java:465) @ sun.security.ssl.inputrecord.read(inputrecord.java:503) @ sun.security.ssl.sslsocketimpl.readrecord(sslsocketimpl.java:954) @ sun.security.ssl.sslsocketimpl.readdatarecord(sslsocketimpl.java:911) @ sun.security.ssl.appinputstream.read(appinputstream.java:105) @ java.io.bufferedinputstream.fill(bufferedinputstream.java:246) @ java.io.bufferedinputstream.read1(bufferedinputstream.java:286) @ java.io.bufferedinputstream.read(bufferedinputstream.java:345) @ sun.net.www.http.httpclient.parsehttpheader(httpclient.java:703) @ sun.net.www.http.httpclient.parsehttp(httpclient.java:647) @ sun.net.www.protocol.http.httpurlconnection.getinputstream0(httpurlconnection.java:1534) @ sun.net.www.protocol.http.httpurlconnection.getinputstream(httpurlconnection.java:1439) @ java.net.httpurlconnection.getresponsecode(httpurlconnection.java:480) @ sun.net.www.protocol.https.httpsurlconnectionimpl.getresponsecode(httpsurlconnectionimpl.java:338) @ com.google.api.client.http.javanet.nethttpresponse.<init>(nethttpresponse.java:37) @ com.google.api.client.http.javanet.nethttprequest.execute(nethttprequest.java:94) @ com.google.api.client.http.httprequest.execute(httprequest.java:972) @ com.google.api.client.googleapis.media.mediahttpuploader.executecurrentrequestwithoutgzip(mediahttpuploader.java:545) @ com.google.api.client.googleapis.media.mediahttpuploader.executecurrentrequest(mediahttpuploader.java:562) @ com.google.api.client.googleapis.media.mediahttpuploader.resumableupload(mediahttpuploader.java:419) @ com.google.api.client.googleapis.media.mediahttpuploader.upload(mediahttpuploader.java:336) @ com.google.api.client.googleapis.services.abstractgoogleclientrequest.executeunparsed(abstractgoogleclientrequest.java:427) @ com.google.api.client.googleapis.services.abstractgoogleclientrequest.executeunparsed(abstractgoogleclientrequest.java:352) @ com.google.api.client.googleapis.services.abstractgoogleclientrequest.execute(abstractgoogleclientrequest.java:469) 

looks delay in reading s3 causes google http-client timeout. wanted increase timeout , tried below options.

val req = bq.jobs().insert(pid, job, data).buildhttprequest() req.setreadtimeout(3 * 60 * 1000) val res = req.execute() 

but causes precondition failure in bigquery. expects mediauploader null, not sure why though.

exception in thread "main" java.lang.illegalargumentexception     @ com.google.api.client.repackaged.com.google.common.base.preconditions.checkargument(preconditions.java:76)     @ com.google.api.client.util.preconditions.checkargument(preconditions.java:37)     @ com.google.api.client.googleapis.services.abstractgoogleclientrequest.buildhttprequest(abstractgoogleclientrequest.java:297) 

this caused me try second insert api on bigquery

val req = bq.jobs().insert(pid, job).buildhttprequest().setreadtimeout(3 * 60 * 1000).setcontent(data) val res = req.execute() 

and time failed different error.

exception in thread "main" com.google.api.client.googleapis.json.googlejsonresponseexception: 400 bad request {   "code" : 400,   "errors" : [ {     "domain" : "global",     "message" : "job configuration must contain 1 job-specific configuration object (e.g., query, load, extract, spreadsheetextract), there 0: ",     "reason" : "invalid"   } ],   "message" : "job configuration must contain 1 job-specific configuration object (e.g., query, load, extract, spreadsheetextract), there 0: " } 

please suggest me how can set timeout. point me if doing wrong.

i'll answer main question title: how set timeouts using java client library.

to set timeouts, need custom httprequestinitializer configured in client. example:

bigquery.builder builder =     new bigquery.builder(new urlfetchtransport(), new jacksonfactory(), credential); final httprequestinitializer existing = builder.gethttprequestinitializer(); builder.sethttprequestinitializer(new httprequestinitializer() {     @override     public void initialize(httprequest request) throws ioexception {       existing.initialize(request);       request           .setreadtimeout(read_timeout)           .setconnecttimeout(connection_timeout);       }     }); bigquery client = builder.build(); 

i don't think solve issues facing. few ideas might helpful, don't understand scenario these may off track:

  • if moving large files: consider staging them on gcs before loading them bigquery.
  • if using media upload send data request: these can't large or risk timeouts or network connection failures.
  • if running embarrassingly parallel data migration, , data chunks relatively small, bigquery.tabledata.insertall may more appropriate large fan-in scenarios this. see https://cloud.google.com/bigquery/streaming-data-into-bigquery more details.

thanks question!


Comments

Popular posts from this blog

1111. appearing after print sequence - php -

java - WARN : org.springframework.web.servlet.PageNotFound - No mapping found for HTTP request with URI [/board/] in DispatcherServlet with name 'appServlet' -

Ruby on Rails, ActiveRecord, Postgres, UTF-8 and ASCII-8BIT encodings -