Inheritance diagram for nipy.neurospin.clustering.hierarchical_clustering:
These routines perform some hierrachical agglomerative clustering of some input data. The following alternatives are proposed: - Distance based average-link - Similarity-based average-link - Distance based maximum-link - Ward’s algorithm under graph constraints - Ward’s algorithm without graph constraints
In this latest version, the results are returned in a ‘WeightedForest’ structure, which gives access to the clustering hierarchy, facilitates the plot of the result etc.
For back-compatibility, *_segment versions of the algorithms have been appended, with the old API (except the qmax parameter, which now represents the number of wanted clusters)
Author : Bertrand Thirion,Pamela Guevara, 2006-2009
Bases: nipy.neurospin.graph.graph.Forest
This is a weighted Forest structure, i.e. a tree - ecah node has one parent and children (hierarchical structure) - some of the nodes can be viewed as leaves, other as roots - the edges within a tree are associated with a weight: +1 from child to parent -1 from parent to child - additionally, the nodes have a value, which is called ‘height’, especially useful from dendrograms
fields: - V : (int,>0) the number of vertices - E : (int) the number of edges - parents: array of shape (self.V) the parent array - edges: array of shape (self.E,2) reprensenting pairwise neighbors - weights, array of shape (self.E), +1/-1 for scending/descending links - children: list of arrays that represents the childs of any node - height: array of shape(self.V)
Average link clustering based on a pairwise distance matrix.
Average_Link_Distance(D,verbose=0): INPUT: - D: a (n,n) distance matrix between some items - verbose=0, verbosity level OUTPUT: -t a weightForest structure that represents the dendrogram of the data NOTE: this method has not been optimized
Average link clustering based on a pairwise distance matrix.
Average_Link_Distance(D,stop=-1,qmax=-1,verbose=0): INPUT: - D: a (n,n) distance matrix between some items - stop=-1: stopping criterion, i.e. distance threshold at which further merges are forbidden By default, all merges are performed - qmax = 1; the number of desired clusters (in the limit of stop) - verbose=0, verbosity level OUTPUT: -u: a labelling of the graph vertices according to the criterion -cost the cost of each merge step during the clustering procedure NOTE: this method has not been optimized
Ward clustering based on a Feature matrix
t = Ward(feature,verbose=0): INPUT: - feature: a (n,p) feature matrix between some items representing n p-dimenional items to be clustered - verbose=0, verbosity level OUTPUT: -t a weightForest structure that represents the dendrogram of the data NOTE: this method uses the optimized C routine if “quick” is true.