python 3.x - Remove duplicates in a csv file based on two columns? -
i have csv must read , have duplicate values removed before gets written.
duplicate value based on 2 columns (date, price) (and conditional statement). therefore in example below row 1, row 2, , row 4 written csv. row 3 qualify duplicate (since same date , price match row 1) , excluded (not written csv).
address floor date price 40 b street 18 3/29/2015 2200000 40 b street 23 1/7/2015 999000 40 b street 18 3/29/2015 2200000 40 b street 18 4/29/2015 2200000
you can use dictreader
, dictwriter
fulfill task.
import csv def main(): """read csv file, delete duplicates , write it.""" open('test.csv', 'r',newline='') inputfile: open('testout.csv', 'w', newline='') outputfile: duplicatereader = csv.dictreader(inputfile, delimiter=',') uniquewrite = csv.dictwriter(outputfile, fieldnames=['address', 'floor', 'date', 'price'], delimiter=',') uniquewrite.writeheader() keysread = [] row in duplicatereader: key = (row['date'], row['price']) if key not in keysread: print(row) keysread.append(key) uniquewrite.writerow(row) if __name__ == '__main__': main()
Comments
Post a Comment