Using Google Dataproc to import CSV data in Bigtable -
i'm trying use instance of dataproc cluster import large csv files hdfs, export them sequencefile format, import latest bigtable described here: https://cloud.google.com/bigtable/docs/exporting-importing
i imported csv files external table in hive, exported them inserting them in sequencefile backed table.
however (probably since seems dataproc ships hive 1.0?), faced cast exception error mentioned here: bigtable import error
i can't seem hbase shell or zookeeper , running on dataproc master vm, can't run simple export job cli.
is there alternative way export bigtable-compatible sequence files dataproc ?
what's proper configuration setup hbase , zookeeper running dataproc vm master node ?
the import instructions linked instructions importing data existing hbase deployment.
if input format you're working csv, creating sequencefiles unnecessary step. how writing hadoop mapreduce process csv files , write directly cloud bigtable? dataflow fit here.
take @ samples here: https://github.com/googlecloudplatform/cloud-bigtable-examples/tree/master/java
Comments
Post a Comment