python - scikit SVM returns ValueError: X.shape[1] = 181 should be equal to 865, the number of features at training time? -


i training svm on combination of string , numeric dataset, below code , test , training set. when running code, throws following error.

 import numpy np  import pandas pd  import scipy.sparse sp  import sklearn.preprocessing  import sklearn.decomposition  import sklearn.linear_model  import sklearn.pipeline  import sklearn.metrics  sklearn.feature_extraction.text import countvectorizer  sklearn import svm    class unirank(object):      vect = countvectorizer(ngram_range=(1, 3))      def __init__(self, trained_data_csv, sep=","):         df = pd.read_csv(trained_data_csv, sep=sep)         sample = df[['name', 'location']]         sample = sample.apply(lambda col: col.str.strip())         # convert characters matrix         train = sp.hstack(sample.apply(lambda col: self.vect.fit_transform(col)))         values = df[['score']]         values.to_records()         # feature selection , manipulation         self.clf = svm.svc(gamma=0.001, c=100)         x,y = train, values         # applying model         self.clf.fit(x,y)      def test(self, test_data_csv, sep=","):         df = pd.read_csv(test_data_csv, sep=sep)         sample = df[['name', 'location']]         sample = sample.apply(lambda col: col.str.strip())         test = sp.hstack(sample.apply(lambda col:   self.vect.fit_transform(col)))     return self.clf.predict(test)   if __name__ == '__main__':     ur = unirank('/home/maitreyee/documents/rdata/classifyuniwithr/version2/scored_collg1.csv')     print ur.test('/home/maitreyee/documents/rdata/classifyuniwithr/version2/test1uni.csv') 

and following error when run above script

/home/maitreyee/anaconda/lib/python2.7/site-packages/sklearn    /svm/base.py:472: dataconversionwarning: column-vector y passed when     1d array expected. please change shape of y (n_samples, ),     example using ravel().   y_ = column_or_1d(y, warn=true)  traceback (most recent call last):   file "uni_rank2.py", line 40, in <module>     print ur.test('/home/maitreyee/documents/rdata/classifyuniwithr/version2/test1uni.csv')   file "uni_rank2.py", line 35, in test      return self.clf.predict(test)    file "/home/maitreyee/anaconda/lib/python2.7/site-packages/sklearn   /svm/base.py", line 500, in predict      y = super(basesvc, self).predict(x)   file "/home/maitreyee/anaconda/lib/python2.7/site-packages/sklearn/svm/base.py", line 290, in predict     x = self._validate_for_predict(x)   file "/home/maitreyee/anaconda/lib/python2.7/site-packages/sklearn/svm/base.py", line 443, in _validate_for_predict     (n_features, self.shape_fit_[1])) valueerror: x.shape[1] = 181 should equal 865, number of   features @ training time 

my training set

index,name,location,loc_val,sal_val,mean_rank,mean_score,sum_score,score 0,indian institute of technology (iitdelhi),delhi,0.0128,1.028125,0.0162,0.352375,1.057125,100 1,indian institute of technology (iitdelhi),delhi,0.0128,0.990625,0.0162,0.339875,1.019625,100 2,indian institute of technology (iitdelhi),delhi,0.0128,0.959375,0.0162,0.3294583333,0.988375,100 3,indian institute of technology (iitbombay),bombay,0.008,1,0.02025,0.34275,1.02825,100 4,indian institute of technology (iitbombay),bombay,0.008,1,0.02025,0.34275,1.02825,100 5,indian institute of technology (iitbombay),bombay,0.008,1,0.02025,0.34275,1.02825,100 6,indian institute of technology (iitkharagpur),kharagpur,0.0176,0.991875,0.022275,0.3439166667,1.03175,100 7,indian institute of technology (iitkharagpur),kharagpur,0.0176,1.0125,0.022275,0.3507916667,1.052375,100 8,indian institute of technology (iitkharagpur),kharagpur,0.0176,0.95375,0.022275,0.3312083333,0.993625,100 9,indian institute of technology (iitmadras),madras,0.0224,0.9875,0.02835,0.3460833333,1.03825,100 

test set

index,name,location,location_val,salary_val,mean_rank,mean_score,sum_score 254,gandhi institute of technology , management engineering,vishakapatnam,0.0096,0.4925,0.5508,0.3509666667,1.0529 255,cochin university of science , technology engineering,cochin,0.0112,0.296875,0.62775,0.3119416667,0.935825 256,cochin university of science , technology engineering,cochin,0.0112,0.443125,0.62775,0.3606916667,1.082075 257,cochin university of science , technology engineering,cochin,0.0112,0.296875,0.62775,0.3119416667,0.935825 258,kc college of arts science & commerce arts,lucknow,0.008,0.21875,0.32805,0.1849333333,0.5548 259,faculty of arts university of lucknow arts,lucknow,0.0032,0.21875,0.3483,0.1900833333,0.57025 260,scottish church college arts,kolkata,0.0192,0.21875,0.3564,0.1981166667,0.59435 261,l.d. arts college arts,ahmedabad,0.0112,0,0.3645,0.1252333333,0.3757 262,st. francis college women arts,hyderabad,0.0112,0.125,0.2997,0.1453,0.4359 263,wilson college arts,mumbai,0.008,0.125,0.3807,0.1712333333,0.5137 264,psg college of arts & science arts,coimbatore,0.0064,0.125,0.3888,0.1734,0.5202 

i using svm string data first time in python. if there better ways train svm on string set kindly let me know.


Comments

Popular posts from this blog

html - Outlook 2010 Anchor (url/address/link) -

javascript - Why does running this loop 9 times take 100x longer than running it 8 times? -

Getting gateway time-out Rails app with Nginx + Puma running on Digital Ocean -