Python scikit learn pca.explained_variance_ratio_ cutoff -


guru,

when choosing number of principal components (k), choose k smallest value example, 99% of variance, retained.

however, in python scikit learn, not 100% sure pca.explained_variance_ratio_ = 0.99 equal "99% of variance retained"? enlighten? thanks.

  • the python scikit learn pca manual here

http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.pca.html#sklearn.decomposition.pca

yes, right. pca.explained_variance_ratio_ parameter returns vector of variance explained each dimension. pca.explained_variance_ratio_[i] gives variance explained solely i+1st dimension.

you want pca.explained_variance_ratio_.cumsum(). return vector x such x[i] returns cumulative variance explained first i+1 dimensions.

import numpy np sklearn.decomposition import pca  np.random.seed(0) my_matrix = np.random.randn(20, 5)  my_model = pca(n_components=5) my_model.fit_transform(my_matrix)  print my_model.explained_variance_ print my_model.explained_variance_ratio_ print my_model.explained_variance_ratio_.cumsum() 

[ 1.50756565  1.29374452  0.97042041  0.61712667  0.31529082] [ 0.32047581  0.27502207  0.20629036  0.13118776  0.067024  ] [ 0.32047581  0.59549787  0.80178824  0.932976    1.        ] 

so in random toy data, if picked k=4 retain 93.3% of variance.


Comments

Popular posts from this blog

html - Outlook 2010 Anchor (url/address/link) -

javascript - Why does running this loop 9 times take 100x longer than running it 8 times? -

Getting gateway time-out Rails app with Nginx + Puma running on Digital Ocean -