Using Google Dataproc to import CSV data in Bigtable -


i'm trying use instance of dataproc cluster import large csv files hdfs, export them sequencefile format, import latest bigtable described here: https://cloud.google.com/bigtable/docs/exporting-importing

i imported csv files external table in hive, exported them inserting them in sequencefile backed table.

however (probably since seems dataproc ships hive 1.0?), faced cast exception error mentioned here: bigtable import error

i can't seem hbase shell or zookeeper , running on dataproc master vm, can't run simple export job cli.

  1. is there alternative way export bigtable-compatible sequence files dataproc ?

  2. what's proper configuration setup hbase , zookeeper running dataproc vm master node ?

the import instructions linked instructions importing data existing hbase deployment.

if input format you're working csv, creating sequencefiles unnecessary step. how writing hadoop mapreduce process csv files , write directly cloud bigtable? dataflow fit here.

take @ samples here: https://github.com/googlecloudplatform/cloud-bigtable-examples/tree/master/java


Comments

Popular posts from this blog

html - Outlook 2010 Anchor (url/address/link) -

javascript - Why does running this loop 9 times take 100x longer than running it 8 times? -

Getting gateway time-out Rails app with Nginx + Puma running on Digital Ocean -