Python scikit learn pca.explained_variance_ratio_ cutoff -


guru,

when choosing number of principal components (k), choose k smallest value example, 99% of variance, retained.

however, in python scikit learn, not 100% sure pca.explained_variance_ratio_ = 0.99 equal "99% of variance retained"? enlighten? thanks.

  • the python scikit learn pca manual here

http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.pca.html#sklearn.decomposition.pca

yes, right. pca.explained_variance_ratio_ parameter returns vector of variance explained each dimension. pca.explained_variance_ratio_[i] gives variance explained solely i+1st dimension.

you want pca.explained_variance_ratio_.cumsum(). return vector x such x[i] returns cumulative variance explained first i+1 dimensions.

import numpy np sklearn.decomposition import pca  np.random.seed(0) my_matrix = np.random.randn(20, 5)  my_model = pca(n_components=5) my_model.fit_transform(my_matrix)  print my_model.explained_variance_ print my_model.explained_variance_ratio_ print my_model.explained_variance_ratio_.cumsum() 

[ 1.50756565  1.29374452  0.97042041  0.61712667  0.31529082] [ 0.32047581  0.27502207  0.20629036  0.13118776  0.067024  ] [ 0.32047581  0.59549787  0.80178824  0.932976    1.        ] 

so in random toy data, if picked k=4 retain 93.3% of variance.


Comments

Popular posts from this blog

java - WARN : org.springframework.web.servlet.PageNotFound - No mapping found for HTTP request with URI [/board/] in DispatcherServlet with name 'appServlet' -

android - How to create dynamically Fragment pager adapter -

1111. appearing after print sequence - php -