python - Update list of unique items in a dictionary -


i creating json file have unique values each column of csv. doing right generating dictionary unique values of each column stored separate entry (the column name being key).

i have download new version of csv regularly , update meta-data json. current plan download latest update csv (we’re using elastic search), read off unique values csv, update meta-data json, , concatenate new , old csv’s.

questions:

  1. is there more efficient way this? old csv ~10gb, 51m rows, 1400 columns; takes day generate json. here’s current code:

.

import pandas pd import numpy np import datetime import json   filename = sys.argv[1] json_file = sys.argv[2]  def get_col_stats(colname, numrows=none):     print('start reading ' + colname)     df = pd.read_csv(filename, engine='c', usecols=[colname], nrows = numrows)     print('finished reading ' + colname)      df.columns = ['col']     uniq = list(df.col.unique())     count = len(uniq)     print('unique count is', count, '\n')      if colname in ['orderyear', 'faultdate', 'faultactivetime']:         return {'type': 'date', 'min': df.col.dropna().min(), 'max': df.col.dropna().max()}     elif count < 1000 or colname == 'faultcode':         return {'type': 'factor', 'uniq': uniq}     else:         return {'type': 'continuous', 'min': df.col.dropna().min(), 'max': df.col.dropna().max()}  def default(o):     if isinstance(o, np.integer): return int(o)     raise typeerror   col_list = list(pd.read_csv(filename, nrows=1).columns) print(col_list[1:50])  d = {}  in col_list:     d[i] = get_col_stats(i, numrows=none)      print('made ' + i)     open(json_file, 'w') fp:         json.dump(d, fp, default=default) 
  1. is there better way update dictionary unique values this:

.

 dic = {'a': [1,2,3], 'b': [3,4,5]}  dic['a'].extend([2,3,4])  dic['a'] = list(set(dic['a']))   dic 

not sure on first question, i'm not familiar pandas. question 2, it's easier do:

dic = {'a': [1,2,3], 'b': [3,4,5]} dic['a'] = list(set(dic['a'] + [2,3,4])) dic 

Comments

Popular posts from this blog

1111. appearing after print sequence - php -

java - WARN : org.springframework.web.servlet.PageNotFound - No mapping found for HTTP request with URI [/board/] in DispatcherServlet with name 'appServlet' -

Ruby on Rails, ActiveRecord, Postgres, UTF-8 and ASCII-8BIT encodings -