Python scikit learn pca.explained_variance_ratio_ cutoff -
guru,
when choosing number of principal components (k), choose k smallest value example, 99% of variance, retained.
however, in python scikit learn, not 100% sure pca.explained_variance_ratio_ = 0.99 equal "99% of variance retained"? enlighten? thanks.
- the python scikit learn pca manual here
yes, right. pca.explained_variance_ratio_
parameter returns vector of variance explained each dimension. pca.explained_variance_ratio_[i]
gives variance explained solely i+1st dimension.
you want pca.explained_variance_ratio_.cumsum()
. return vector x
such x[i]
returns cumulative variance explained first i+1 dimensions.
import numpy np sklearn.decomposition import pca np.random.seed(0) my_matrix = np.random.randn(20, 5) my_model = pca(n_components=5) my_model.fit_transform(my_matrix) print my_model.explained_variance_ print my_model.explained_variance_ratio_ print my_model.explained_variance_ratio_.cumsum()
[ 1.50756565 1.29374452 0.97042041 0.61712667 0.31529082] [ 0.32047581 0.27502207 0.20629036 0.13118776 0.067024 ] [ 0.32047581 0.59549787 0.80178824 0.932976 1. ]
so in random toy data, if picked k=4
retain 93.3% of variance.
Comments
Post a Comment