python - AgglomerativeClustering with disconnected connectivity constraint -
i have tried both on latest 0.16.1 version , on latest bleeding edge version of sklearn '0.17.dev0' , appears issue in both.
i use
sklearn.cluster.agglomerativeclustering(affinity='precomputed',connectivity=cmat,linkage='complete')
where cmat connectivity matrix in there disconnected components. indicated source code, error message userwarning: number of connected components of connectivity matrix *>1. completing avoid stopping tree early.
however, reading source code see when completing connectivity matrix developers wondering whether clustering can take place without completing matrix:
""xxx: can without completing matrix?""
i interested in development. think sklearn planning fix , make possible clustering without completing matrix? has implemented themselves? gladly take advise on this!
for interested, figured out avoid problem. if use distance matrix input way do, , values between 0 , 1. add distance value of 1 between corresponding disconnected components. checked source code, , way completes connectivity matrix considering minimum distance between disconnected components. in way, adding highest possible distance of 1 (or other maximum distance) force connectivity matrix completed in more natural way rather merging disconnected components.
Comments
Post a Comment