python - scikit SVM returns ValueError: X.shape[1] = 181 should be equal to 865, the number of features at training time? -


i training svm on combination of string , numeric dataset, below code , test , training set. when running code, throws following error.

 import numpy np  import pandas pd  import scipy.sparse sp  import sklearn.preprocessing  import sklearn.decomposition  import sklearn.linear_model  import sklearn.pipeline  import sklearn.metrics  sklearn.feature_extraction.text import countvectorizer  sklearn import svm    class unirank(object):      vect = countvectorizer(ngram_range=(1, 3))      def __init__(self, trained_data_csv, sep=","):         df = pd.read_csv(trained_data_csv, sep=sep)         sample = df[['name', 'location']]         sample = sample.apply(lambda col: col.str.strip())         # convert characters matrix         train = sp.hstack(sample.apply(lambda col: self.vect.fit_transform(col)))         values = df[['score']]         values.to_records()         # feature selection , manipulation         self.clf = svm.svc(gamma=0.001, c=100)         x,y = train, values         # applying model         self.clf.fit(x,y)      def test(self, test_data_csv, sep=","):         df = pd.read_csv(test_data_csv, sep=sep)         sample = df[['name', 'location']]         sample = sample.apply(lambda col: col.str.strip())         test = sp.hstack(sample.apply(lambda col:   self.vect.fit_transform(col)))     return self.clf.predict(test)   if __name__ == '__main__':     ur = unirank('/home/maitreyee/documents/rdata/classifyuniwithr/version2/scored_collg1.csv')     print ur.test('/home/maitreyee/documents/rdata/classifyuniwithr/version2/test1uni.csv') 

and following error when run above script

/home/maitreyee/anaconda/lib/python2.7/site-packages/sklearn    /svm/base.py:472: dataconversionwarning: column-vector y passed when     1d array expected. please change shape of y (n_samples, ),     example using ravel().   y_ = column_or_1d(y, warn=true)  traceback (most recent call last):   file "uni_rank2.py", line 40, in <module>     print ur.test('/home/maitreyee/documents/rdata/classifyuniwithr/version2/test1uni.csv')   file "uni_rank2.py", line 35, in test      return self.clf.predict(test)    file "/home/maitreyee/anaconda/lib/python2.7/site-packages/sklearn   /svm/base.py", line 500, in predict      y = super(basesvc, self).predict(x)   file "/home/maitreyee/anaconda/lib/python2.7/site-packages/sklearn/svm/base.py", line 290, in predict     x = self._validate_for_predict(x)   file "/home/maitreyee/anaconda/lib/python2.7/site-packages/sklearn/svm/base.py", line 443, in _validate_for_predict     (n_features, self.shape_fit_[1])) valueerror: x.shape[1] = 181 should equal 865, number of   features @ training time 

my training set

index,name,location,loc_val,sal_val,mean_rank,mean_score,sum_score,score 0,indian institute of technology (iitdelhi),delhi,0.0128,1.028125,0.0162,0.352375,1.057125,100 1,indian institute of technology (iitdelhi),delhi,0.0128,0.990625,0.0162,0.339875,1.019625,100 2,indian institute of technology (iitdelhi),delhi,0.0128,0.959375,0.0162,0.3294583333,0.988375,100 3,indian institute of technology (iitbombay),bombay,0.008,1,0.02025,0.34275,1.02825,100 4,indian institute of technology (iitbombay),bombay,0.008,1,0.02025,0.34275,1.02825,100 5,indian institute of technology (iitbombay),bombay,0.008,1,0.02025,0.34275,1.02825,100 6,indian institute of technology (iitkharagpur),kharagpur,0.0176,0.991875,0.022275,0.3439166667,1.03175,100 7,indian institute of technology (iitkharagpur),kharagpur,0.0176,1.0125,0.022275,0.3507916667,1.052375,100 8,indian institute of technology (iitkharagpur),kharagpur,0.0176,0.95375,0.022275,0.3312083333,0.993625,100 9,indian institute of technology (iitmadras),madras,0.0224,0.9875,0.02835,0.3460833333,1.03825,100 

test set

index,name,location,location_val,salary_val,mean_rank,mean_score,sum_score 254,gandhi institute of technology , management engineering,vishakapatnam,0.0096,0.4925,0.5508,0.3509666667,1.0529 255,cochin university of science , technology engineering,cochin,0.0112,0.296875,0.62775,0.3119416667,0.935825 256,cochin university of science , technology engineering,cochin,0.0112,0.443125,0.62775,0.3606916667,1.082075 257,cochin university of science , technology engineering,cochin,0.0112,0.296875,0.62775,0.3119416667,0.935825 258,kc college of arts science & commerce arts,lucknow,0.008,0.21875,0.32805,0.1849333333,0.5548 259,faculty of arts university of lucknow arts,lucknow,0.0032,0.21875,0.3483,0.1900833333,0.57025 260,scottish church college arts,kolkata,0.0192,0.21875,0.3564,0.1981166667,0.59435 261,l.d. arts college arts,ahmedabad,0.0112,0,0.3645,0.1252333333,0.3757 262,st. francis college women arts,hyderabad,0.0112,0.125,0.2997,0.1453,0.4359 263,wilson college arts,mumbai,0.008,0.125,0.3807,0.1712333333,0.5137 264,psg college of arts & science arts,coimbatore,0.0064,0.125,0.3888,0.1734,0.5202 

i using svm string data first time in python. if there better ways train svm on string set kindly let me know.


Comments

Popular posts from this blog

1111. appearing after print sequence - php -

java - WARN : org.springframework.web.servlet.PageNotFound - No mapping found for HTTP request with URI [/board/] in DispatcherServlet with name 'appServlet' -

Ruby on Rails, ActiveRecord, Postgres, UTF-8 and ASCII-8BIT encodings -