How to compare dates from Excel sheets for value filling purposes on python -
i have excel file 2 sheets.
one sheet containing data:
date tmax tmin 20110706 317 211 20110707 322 211 20110708 317 211 20110709 322 211 20110710 328 222 20110711 333 244 20110712 356 250 20110713 356 222 and other sheet includes:
start date end date rep month cost kwh kw 7/6/2011 8/3/2011 july 5,065.17 76,640 205 8/3/2011 9/7/2011 august 5,572.38 86,640 195 my goal write column on sheet 1 (kwh) sheet 2 depending on if date on sheet 1 falls within range of kwh.
for example:
date tmax tmin kwh 20110706 317 211 76640 20110707 322 211 76640 20110708 317 211 76640 20110709 322 211 76640 20110710 328 222 76640 20110711 333 244 76640 20110712 356 250 76640 20110713 356 222 76640 20110801 344 228 76640 20110802 356 200 76640 20110803 367 200 86640 20110804 361 228 86640 i having trouble figuring out how kind of algorithmic parsing able implement trying do.
i familiar how write file read file/cells pandas.
here code:
import pandas pd pandas import excelwriter df = pd.read_excel("thecddhddtest.xlsx",'sheet1') df2 = pd.read_excel("thecddhddtest.xlsx",'sheet2') df.head() df["date"] = pd.to_datetime(df["date"], format="%y%m%d") pd.to_datetime(df2["start date"], format="%m/%d/%y") df3 = df2.set_index("start date") df3["kwh"].reindex(df["date"], method="ffill") df["kwh"] = df3["kwh"].reindex(df["date"], method="ffill") print(df["kwh"]) writer = excelwriter('thecddhddtestkwh.xlsx') df.to_excel(writer,'sheet1',index=false) df2.to_excel(writer,'sheet2',index=false) writer.save() which results in:
date tmax tmin kwh 20110706 317 211 20110707 322 211 20110708 317 211 20110709 322 211 20110710 328 222 20110711 333 244 20110712 356 250 20110713 356 222 kwh cell empty reason
it's critical parse date columns pandas timestamps/ numpy datetime64. best way use to_datetime format.
in [11]: df out[11]: date tmax tmin 0 20110706 317 211 1 20110707 322 211 2 20110708 317 211 3 20110709 322 211 4 20110710 328 222 5 20110711 333 244 6 20110712 356 250 7 20110713 356 222 8 20110801 344 228 9 20110802 356 200 10 20110803 367 200 11 20110804 361 228 in [12]: df["date"] = pd.to_datetime(df["date"], format="%y%m%d") in [13]: df out[13]: date tmax tmin 0 2011-07-06 317 211 1 2011-07-07 322 211 2 2011-07-08 317 211 3 2011-07-09 322 211 4 2011-07-10 328 222 5 2011-07-11 333 244 6 2011-07-12 356 250 7 2011-07-13 356 222 8 2011-08-01 344 228 9 2011-08-02 356 200 10 2011-08-03 367 200 11 2011-08-04 361 228 similarly (with different format):
in [14]: pd.to_datetime(df2["start date"], format="%m/%d/%y") out[14]: 0 2011-07-06 1 2011-08-03 name: start date, dtype: datetime64[ns] now, first observation wouldn't make sense if periods not mutually exclusive. means need consider start date*.
this means can reindex seconds sheet, forward fill, , you're done:
in [21]: df3 = df2.set_index("start date") in [22]: df3 out[22]: end date rep month cost kwh kw start date 2011-07-06 8/3/2011 july 5,065.17 76,640 205 2011-08-03 9/7/2011 august 5,572.38 86,640 195 this allows reindex dates dataframe:
in [23]: df3["kwh"].reindex(df["date"], method="ffill") out[23]: date 2011-07-06 76,640 2011-07-07 76,640 2011-07-08 76,640 2011-07-09 76,640 2011-07-10 76,640 2011-07-11 76,640 2011-07-12 76,640 2011-07-13 76,640 2011-08-01 76,640 2011-08-02 76,640 2011-08-03 86,640 2011-08-04 86,640 name: kwh, dtype: object and set column in df.
in [24]: df["kwh"] = df3["kwh"].reindex(df["date"], method="ffill") *if there "empty" periods add in nan rows, corresponding "empty" start-date.
Comments
Post a Comment