python - Out of sample prediction in statsmodel with NaN values -

May 15, 2014

i have dataset comprised of various values concerning auto_sales in usa.

i'm trying predict auto_sales october 2010 using simple ols regression.

df2 = pd.read_csv('paul_data/question12_prediction_data.csv') window_size = 7                                              #-1 due zero-indexing of array window = df2.ix[0:window_size,:] print window  result = sm.ols(formula="log_sales ~ log_sales_l2 + vehicleshopping_l2 + vehiclebrand_l2 + actual_sales_edmunds_l1 + issummer + iswinter", data=df2).fit() print result.predict()[df2[(df2.month == 10) & (df2.year == 2015)].index[0]]

window following data:

year  month  auto_sales  log_sales  log_sales_l1  log_sales_l2  \ 0  2015      3       83352  11.330828     11.294807     11.317823    1  2015      4       83871  11.337035     11.330828     11.294807    2  2015      5       85489  11.356143     11.337035     11.330828    3  2015      6       84123  11.340035     11.356143     11.337035    4  2015      7       85320  11.354164     11.340035     11.356143    5  2015      8         nan        nan     11.354164     11.340035    6  2015      9         nan        nan           nan     11.354164    7  2015     10         nan        nan           nan           nan        log_sales_l3  gt_vehicleshopping  gt_vehiclemaintenance  gt_suvs  \ 0     11.313523              0.1320                  0.694   0.0680    1     11.317823              0.1150                  0.745   0.0525    2     11.294807              0.1060                  0.754   0.0560    3     11.330828              0.0950                  0.785   0.0550    4     11.337035              0.1025                  0.870   0.1075    5     11.356143              0.1140                  0.794   0.1240    6     11.340035                 nan                    nan      nan    7           nan                 nan                    nan      nan               ...          vansminivans_l2  iswinter  issummer  vehiclebrands  \ 0         ...                   0.0900         1         0           0.08    1         ...                   0.1250         0         0           0.09    2         ...                   0.1580         0         0           0.09    3         ...                   0.1750         0         1           0.12    4         ...                   0.1920         0         1           0.17    5         ...                   0.2100         0         1            nan    6         ...                   0.2175         0         0            nan    7         ...                      nan       nan       nan            nan        vehiclebrand_l1  vehiclebrand_l2  actual_sales_edmunds  edmund_forecast  \ 0             0.05             0.03               1542841          1522881    1             0.08             0.05               1451790          1464176    2             0.09             0.08               1631234          1591221    3             0.09             0.09               1473142          1484487    4             0.12             0.09               1507643          1478025    5             0.17             0.12               1573573          1538958    6              nan             0.17                   nan              nan    7              nan              nan                   nan              nan        actual_sales_edmunds_l1  edmund_forecast_l1   0                  1255458             1285019   1                  1542841             1522881   2                  1451790             1464176   3                  1631234             1591221   4                  1473142             1484487   5                  1507643             1478025   6                  1573573             1538958   7                      nan                 nan    [8 rows x 32 columns]

however following error:

indexerror                                traceback (most recent call last) <ipython-input-83-16bf72335e7f> in <module>()       5        6 result = sm.ols(formula="log_sales ~ log_sales_l2 + vehicleshopping_l2 + vehiclebrand_l2 + actual_sales_edmunds_l1 + issummer + iswinter", data=df2).fit() ----> 7 print result.predict()[df2[(df2.month == 10) & (df2.year == 2015)].index[0]]       8 #np.exp(result.predict(df2.ix[x+(window_size)]))  indexerror: index 7 out of bounds axis 0 size 5

i'm not sure how proceed @ point, understand trying out of sample prediction i've tried far has failed solve issue.

your problem, believe, data on regressing has 5 entries in not input nan. therefore this:

result.predict()

returns array of 5 elements, this:

df2[(df2.month == 10) & (df2.year == 2015)].index[0]

returns '7', slicing performing returns 1 row, corresponds 8th row in original dataframe. asking "give me 8th element of array of length 5" , therefore breaks.

Search This Blog

TSQL

python - Out of sample prediction in statsmodel with NaN values -

Comments

Post a Comment

Popular posts from this blog

1111. appearing after print sequence - php -

node.js - Express and Redis - If session exists for this user, don't allow access -

java - WARN : org.springframework.web.servlet.PageNotFound - No mapping found for HTTP request with URI [/board/] in DispatcherServlet with name 'appServlet' -