Inheritance diagram for nipy.neurospin.clustering.hierarchical_clustering:
These routines perform some hierrachical agglomerative clustering of some input data. The following alternatives are proposed: - Distance based average-link - Similarity-based average-link - Distance based maximum-link - Ward’s algorithm under graph constraints - Ward’s algorithm without graph constraints
In this latest version, the results are returned in a ‘WeightedForest’ structure, which gives access to the clustering hierarchy, facilitates the plot of the result etc.
For back-compatibility, *_segment versions of the algorithms have been appended, with the old API (except the qmax parameter, which now represents the number of wanted clusters)
Author : Bertrand Thirion,Pamela Guevara, 2006-2009
Bases: nipy.neurospin.graph.forest.Forest
This is a weighted Forest structure, i.e. a tree - ecah node has one parent and children (hierarchical structure) - some of the nodes can be viewed as leaves, other as roots - the edges within a tree are associated with a weight: +1 from child to parent -1 from parent to child - additionally, the nodes have a value, which is called ‘height’, especially useful from dendrograms
Methods
Parameters : | V: the number of edges of the graph : parents=None: array of shape (V) :
height=None: array of shape(V) :
|
---|
Creates the Minimum Spanning Tree self using Kruskal’s algo. efficient is self is sparse
Returns : | K: WeightedGraph instance :
|
---|
Creates the Minimum Spanning Tree self using Kruskal’s algo. efficient is self is sparse
Returns : | K: WeightedGraph instance :
|
---|
label = self.Voronoi_Labelling(seed) performs a voronoi labelling of the graph
Parameters : | seed array of shape (nseeds), type (np.int), :
|
---|---|
Returns : | - labels : array of shape (self.V) the labelling of the vertices fixme: how is dealt the case of diconnected graph ? : |
Defines the graph as the Voronoi diagram (VD) that links the seeds. The VD is defined using the sample points.
Parameters : | seeds: array of shape (self.V,dim) : samples: array of shape (nsamples,dim) : |
---|
returns the sum of weighted degree of graph self
Parameters : | c (int): side selection :
|
---|---|
Returns : | wd : array of shape (self.V),
|
Create the adjacency matrix of self
Returns : | A : an ((self.V*self.V),np.double) array
|
---|
returns all the distances of the graph as a tree
Parameters : | seed=None array of shape(nbseed) with valuesin [0..self.V-1] :
|
---|---|
Returns : | dg: array of shape(nseed, self.V): the resulting distance : |
self.anti_symmeterize() anti-symmeterize the self , ie produces the graph whose adjacency matrix would be the antisymmetric part of its current adjacency matrix
Returns an array of labels corresponding to the different connex components of the graph.
Returns : | label: array of shape(self.V), labelling of the vertices : |
---|
Check that the proposed is indeed a graph, i.e. contains no loop
Returns : | a boolean b=0 iff there are loops, 1 otherwise : |
---|
Check that height[parents[i]]>=height[i] for all nodes
Extraction of the graphe cliques these are defined using replicator dynamics equations
Returns : | - cliques: array of shape (self.V), type (np.int) :
|
---|
self.compute_children() define the children list
Returns : | children: a list of self.V lists, :
|
---|
Returns the index of the edge (j,i) for each edge (i,j) Note: a C implementation might be necessary
returns a copy of self
self.cut_redudancies() Remove possibly redundant edges: if an edge (ab) is present twice in the edge matrix, only the first instance in kept. The weights are processed accordingly
Returns : | - E(int): the number of edges, self.E : |
---|
define the edge and weights array
Returns the degree of the graph vertices.
Returns : | rdegree: (array, type=int, shape=(self.V,)), the right degrees : ldegree: (array, type=int, shape=(self.V,)), the left degrees : |
---|
compute a labelling of the nodes which is 0 for the leaves, 1 for their parents etc and maximal for the roots
Returns : | depth: array of shape (self.V): the depth values of the vertices : |
---|
returns all the [graph] geodesic distances starting from seed it is mandatory that the graph weights are non-negative
Parameters : | seed (int, >-1,<self.V) or array of shape(p) :
|
---|---|
Returns : | dg: array of shape (self.V) , :
|
Sets the graph to be the eps-nearest-neighbours graph of the data
Parameters : | X: array of shape (self.V) or (self.V,p) :
eps=1. (float), the neighborhood width : |
---|---|
Returns : | self.E the number of edges of the resulting graph : |
Idem plot, but the valid edges are enahanced
Idem plot, but the valid edges are enahanced
Returns : | ax, the axes handle of the plot : |
---|
Compute all the geodesic distances starting from seeds it is mandatory that the graph weights are non-negative
Parameters : | seed= None: array of shape (nbseed), type np.int :
|
---|---|
Returns : | dg array of shape (nbseed,self.V) :
|
Sets the graph to be the topological neighbours graph of the three-dimensional coordinates set xyz, in the k-connectivity scheme
Parameters : | xyz: array of shape (self.V, 3) and type np.int, : k = 18: the number of neighbours considered. (6, 18 or 26) : |
---|---|
Returns : | E(int): the number of edges of self : |
sets the edges of self according to the adjacency matrix M
Parameters : | M: array of shape(sef.V, self.V) : |
---|
To get the number of edges in the graph
To get the number of vertices in the graph
returns the list list of children arrays in all the forest if v==-1 or the children of v otherwise
returns the nodes that are children of v
To get the graph’s edges
get the height array
To get the graph’s vertices (as id)
States whether self is connected or not
returns a bool array of shape(self.V) so that isleaf==1 iff the node is a leaf in the forest (has no kids)
returns a bool array of shape(self.V) so that isleaf==1 iff the node is a root in the forest i.e. : is its own parent
E = knn(X, k) set the graph to be the k-nearest-neighbours graph of the data
Parameters : | X array of shape (self.V) or (self.V,p) :
k=1 : is the number of neighbours considered |
---|---|
Returns : | - self.E (int): the number of edges of the resulting graph : |
tests whether the given nodes within ids represent all the leaves of a certain subtree of self
Parameters : | idds: array of shape (n) that takes values in [0..self.V-1] : custom == False, boolean :
|
---|
Returns : | the left incidence matrix of self :
|
---|
returns the set of neighbors of self as a list of arrays
returns the list of all non-trivial subtrees in the graph Caveat: theis function assumes that the vertices are sorted in a way such that parent[i]>i forall i Only the leaves are listeed, not the subtrees themselves
Returns the indexes of the vertices within the main cc
Returns : | idx: array of shape (sizeof main cc) : |
---|
merge the branches of the forest that are the only child of the parent branch into their child
makes self the MST of the array X
Parameters : | X: an array of shape (self.V,dim) :
|
---|---|
Returns : | tl (float) the total length of the mst : |
Normalize the graph according to the index c Normalization means that the sum of the edges values that go into or out each vertex must sum to 1
Parameters : | c=0 in {0,1,2}, optional: index that designates the way :
|
---|
partition the tree according to a cut criterion
Plot the dendrogram associated with self the rank of the data in the dendogram is returned
Parameters : | ax: axis handle, optional : |
---|---|
Returns : | ax, the axis handle : |
Plot the dendrogram associated with self
plot the height of the non-leaves nodes
label = self.propagate_upward(label) Assuming that label is a certain positive integer field (i.e. labels) that is defined at the leaves of the tree and can be compared, this propagates these labels to the parents whenever the children nodes have coherent properties otherwise the parent value is unchanged
Parameters : | label: array of shape(self.V) : |
---|---|
Returns : | label: array of shape(self.V) : |
propagates some binary property in the forest that is defined in the leaves so that prop[parents] = logical_and(prop[children])
Parameters : | prop, array of shape(self.V), the input property : |
---|---|
Returns : | prop, array of shape(self.V), the output property field : |
Removes all the edges for which valid==0
Parameters : | valid, an array of shape (self.E) : |
---|
Removes trivial edges, i.e. edges that are (vv)-like self.weights and self.E are corrected accordingly
Returns : | - self.E (int): The number of edges : |
---|
Reorder the graph according to the index c
Parameters : | c=0 in {0,1,2}, index that designates the array :
|
---|
reorder the tree so that the leaves come first then their parents and so on, and the roots are last the permutation necessary to apply to all vertex-based information
Returns : | order: array of shape(self.V) :
|
---|
Returns : | the right incidence matrix of self :
|
---|
l = self.subtree(k) returns an array of the nodes included in the subtree rooted in k
Parameters : | k (int): the vertex from which the subtree is searched : |
---|---|
Returns : | idx : array of shape>=1 the index of the nodes beneath k |
Sets the graph’s edges
Compute the weights of the graph as the distances between the corresponding rows of X, which represents an embdedding of self
Parameters : | X array of shape (self.V, edim), :
|
---|
Compute the weights of the graph as a gaussian function of the dinstance between the corresponding rows of X, which represents an embdedding of self
Parameters : | X array of shape (self.V,dim) :
sigma=0, float : the parameter of the gaussian function |
---|
set the height array
Parameters : | weights : an array of shape(self.V), edges weights |
---|
a = self.show(X=None) plots the current graph in 2D
Parameters : | X=None, array of shape (self.V,2) :
ax: ax handle, optional : |
---|---|
Returns : | ax: axis handle : |
returns a MST that based on self.weights Note: self must be connected
idem as partition, but a number of components are supplied instead
creates a subforest with the vertices for which valid>0 and with the correponding set of edges the children of deleted vertices become their own parent
Parameters : | valid: array of shape (self.V) : |
---|---|
Returns : | a new forest instance : |
Creates a subgraph with the vertices for which valid>0 and with the correponding set of edges
Parameters : | valid array of shape (self.V): nonzero for vertices to be retained : |
---|---|
Returns : | G WeightedGraph instance, the desired subgraph of self : |
symmeterize the graphself , ie produces the graph whose adjacency matrix would be the symmetric part of its current adjacency matrix
Returns : | sp: scipy.sparse matrix instance, :
|
---|
converts the graph to a neighboring system The neighboring system is nothing but a (sparser) representation of the edge matrix
Returns : | ci, ne, we: arrays of shape (self.V+1), (self.E), (self.E) :
|
---|
return the maximal depth of any node in the tree
Average link clustering based on a pairwise distance matrix.
Parameters : | D array of shape (nbitem,nbitem) with nonnegative values :
verbose=0, verbosity level : |
---|---|
Returns : | t a weightForest structure that represents the dendrogram of the data : |
Average link clustering based on a pairwise distance matrix.
Parameters : | D: a (n,n) distance matrix between some items : stop=-1: stopping criterion, i.e. distance threshold at which :
qmax = 1; the number of desired clusters :
verbose=0, verbosity level : |
---|---|
Returns : | u: a labelling of the graph vertices according to the criterion : cost the cost of each merge step during the clustering procedure : |
Average link clustering based on data matrix.
Parameters : | X array of shape (nbitem,dim): data matrix :
verbose=0, verbosity level : |
---|---|
Returns : | t a weightForest structure that represents the dendrogram of the data : |
Agglomerative function based on a (hopefully sparse) similarity graph
Parameters : | G the input graph : |
---|---|
Returns : | t a weightForest structure that represents the dendrogram of the data : |
Agglomerative function based on a (hopefully sparse) similarity graph
Parameters : | G the input graph : stop=0: the stopping criterion : qmax=1: the number of desired clusters :
|
---|---|
Returns : | u: array of shape (G.V) :
cost: array of shape (G.V (?)) :
|
maximum link clustering based on a pairwise distance matrix.
Parameters : | D: array of shape (n,n) :
verbose=0, verbosity level : |
---|---|
Returns : | t a weightForest structure that represents the dendrogram of the data : |
maximum link clustering based on a pairwise distance matrix.
Returns : | u: array of shape (nbitems) :
cost: array of shape (nbitems-1) :
|
---|
Maximum link clustering based on data matrix.
Parameters : | X: array of shape (nbitem,dim) :
verbose=0, verbosity level : |
---|---|
Returns : | t a weightForest structure that represents the dendrogram of the data : |
Agglomerative function based on a topology-defining graph and a feature matrix.
Returns : | t: a WeightedForest structure that represents the dendrogram : |
---|
Agglomerative function based on a field structure
Parameters : | F the input field (graph+feature) : stop = -1: the stopping crterion :
qmax=1: the maximum number of desired clusters :
|
---|---|
Returns : | u: array of shape (F.V) :
cost array of shape (F.V-1) :
|
Agglomerative function based on a topology-defining graph and a feature matrix.
Parameters : | G graph instance, :
feature: array of shape (G.V,dim_feature): :
|
---|---|
Returns : | t: weightForest instance, :
|
Agglomerative function based on a topology-defining graph and a feature matrix.
Parameters : | G: neurospin.graph.WeightedGraph instance :
feature array of shape (G.V,dim_feature) :
stop = -1: the stopping crterion :
qmax=1: the maximum number of desired clusters :
|
---|---|
Returns : | u: array of shape (G.V) :
cost: array of shape (G.V-1) :
|
Agglomerative function based on a topology-defining graph and a feature matrix.
Parameters : | G the input graph (a topological graph essentially) : feature array of shape (G.V,dim_feature) :
stop = -1: the stopping crterion :
qmax=1: the maximum number of desired clusters :
|
---|---|
Returns : | u: array of shape (G.V): :
cost: array of shape (G.V-1) :
NOTE: : A euclidean distance is used in the feature space : caveat : when the number of cc in G (nbcc)
|
Ward clustering based on a Feature matrix
Returns : | t a weightForest structure that represents the dendrogram of the data : |
---|