python - scikit SVM returns ValueError: X.shape[1] = 181 should be equal to 865, the number of features at training time? -
i training svm on combination of string , numeric dataset, below code , test , training set. when running code, throws following error.
import numpy np import pandas pd import scipy.sparse sp import sklearn.preprocessing import sklearn.decomposition import sklearn.linear_model import sklearn.pipeline import sklearn.metrics sklearn.feature_extraction.text import countvectorizer sklearn import svm class unirank(object): vect = countvectorizer(ngram_range=(1, 3)) def __init__(self, trained_data_csv, sep=","): df = pd.read_csv(trained_data_csv, sep=sep) sample = df[['name', 'location']] sample = sample.apply(lambda col: col.str.strip()) # convert characters matrix train = sp.hstack(sample.apply(lambda col: self.vect.fit_transform(col))) values = df[['score']] values.to_records() # feature selection , manipulation self.clf = svm.svc(gamma=0.001, c=100) x,y = train, values # applying model self.clf.fit(x,y) def test(self, test_data_csv, sep=","): df = pd.read_csv(test_data_csv, sep=sep) sample = df[['name', 'location']] sample = sample.apply(lambda col: col.str.strip()) test = sp.hstack(sample.apply(lambda col: self.vect.fit_transform(col))) return self.clf.predict(test) if __name__ == '__main__': ur = unirank('/home/maitreyee/documents/rdata/classifyuniwithr/version2/scored_collg1.csv') print ur.test('/home/maitreyee/documents/rdata/classifyuniwithr/version2/test1uni.csv')
and following error when run above script
/home/maitreyee/anaconda/lib/python2.7/site-packages/sklearn /svm/base.py:472: dataconversionwarning: column-vector y passed when 1d array expected. please change shape of y (n_samples, ), example using ravel(). y_ = column_or_1d(y, warn=true) traceback (most recent call last): file "uni_rank2.py", line 40, in <module> print ur.test('/home/maitreyee/documents/rdata/classifyuniwithr/version2/test1uni.csv') file "uni_rank2.py", line 35, in test return self.clf.predict(test) file "/home/maitreyee/anaconda/lib/python2.7/site-packages/sklearn /svm/base.py", line 500, in predict y = super(basesvc, self).predict(x) file "/home/maitreyee/anaconda/lib/python2.7/site-packages/sklearn/svm/base.py", line 290, in predict x = self._validate_for_predict(x) file "/home/maitreyee/anaconda/lib/python2.7/site-packages/sklearn/svm/base.py", line 443, in _validate_for_predict (n_features, self.shape_fit_[1])) valueerror: x.shape[1] = 181 should equal 865, number of features @ training time
my training set
index,name,location,loc_val,sal_val,mean_rank,mean_score,sum_score,score 0,indian institute of technology (iitdelhi),delhi,0.0128,1.028125,0.0162,0.352375,1.057125,100 1,indian institute of technology (iitdelhi),delhi,0.0128,0.990625,0.0162,0.339875,1.019625,100 2,indian institute of technology (iitdelhi),delhi,0.0128,0.959375,0.0162,0.3294583333,0.988375,100 3,indian institute of technology (iitbombay),bombay,0.008,1,0.02025,0.34275,1.02825,100 4,indian institute of technology (iitbombay),bombay,0.008,1,0.02025,0.34275,1.02825,100 5,indian institute of technology (iitbombay),bombay,0.008,1,0.02025,0.34275,1.02825,100 6,indian institute of technology (iitkharagpur),kharagpur,0.0176,0.991875,0.022275,0.3439166667,1.03175,100 7,indian institute of technology (iitkharagpur),kharagpur,0.0176,1.0125,0.022275,0.3507916667,1.052375,100 8,indian institute of technology (iitkharagpur),kharagpur,0.0176,0.95375,0.022275,0.3312083333,0.993625,100 9,indian institute of technology (iitmadras),madras,0.0224,0.9875,0.02835,0.3460833333,1.03825,100
test set
index,name,location,location_val,salary_val,mean_rank,mean_score,sum_score 254,gandhi institute of technology , management engineering,vishakapatnam,0.0096,0.4925,0.5508,0.3509666667,1.0529 255,cochin university of science , technology engineering,cochin,0.0112,0.296875,0.62775,0.3119416667,0.935825 256,cochin university of science , technology engineering,cochin,0.0112,0.443125,0.62775,0.3606916667,1.082075 257,cochin university of science , technology engineering,cochin,0.0112,0.296875,0.62775,0.3119416667,0.935825 258,kc college of arts science & commerce arts,lucknow,0.008,0.21875,0.32805,0.1849333333,0.5548 259,faculty of arts university of lucknow arts,lucknow,0.0032,0.21875,0.3483,0.1900833333,0.57025 260,scottish church college arts,kolkata,0.0192,0.21875,0.3564,0.1981166667,0.59435 261,l.d. arts college arts,ahmedabad,0.0112,0,0.3645,0.1252333333,0.3757 262,st. francis college women arts,hyderabad,0.0112,0.125,0.2997,0.1453,0.4359 263,wilson college arts,mumbai,0.008,0.125,0.3807,0.1712333333,0.5137 264,psg college of arts & science arts,coimbatore,0.0064,0.125,0.3888,0.1734,0.5202
i using svm string data first time in python. if there better ways train svm on string set kindly let me know.
Comments
Post a Comment