NIPY logo

Site Navigation

NIPY Community

Table Of Contents

This Page

neurospin.clustering.hierarchical_clustering

Module: neurospin.clustering.hierarchical_clustering

Inheritance diagram for nipy.neurospin.clustering.hierarchical_clustering:

These routines perform some hierrachical agglomerative clustering of some input data. The following alternatives are proposed: - Distance based average-link - Similarity-based average-link - Distance based maximum-link - Ward’s algorithm under graph constraints - Ward’s algorithm without graph constraints

In this latest version, the results are returned in a ‘WeightedForest’ structure, which gives access to the clustering hierarchy, facilitates the plot of the result etc.

For back-compatibility, *_segment versions of the algorithms have been appended, with the old API (except the qmax parameter, which now represents the number of wanted clusters)

Author : Bertrand Thirion,Pamela Guevara, 2006-2009

Class

WeightedForest

class nipy.neurospin.clustering.hierarchical_clustering.WeightedForest(V, parents=None, height=None)

Bases: nipy.neurospin.graph.forest.Forest

This is a weighted Forest structure, i.e. a tree - ecah node has one parent and children (hierarchical structure) - some of the nodes can be viewed as leaves, other as roots - the edges within a tree are associated with a weight: +1 from child to parent -1 from parent to child - additionally, the nodes have a value, which is called ‘height’, especially useful from dendrograms

Methods

Kruskal
Kruskal_dev
Voronoi_Labelling
Voronoi_diagram
WeightedDegree
adjacency
all_distances
anti_symmeterize
cc
check
check_compatible_height
cliques
complete
compute_children
converse_edge
copy Generic (shallow and deep) copying operations.
cut_redundancies
define_graph_attributes
degrees
depth_from_leaves
dijkstra
eps
fancy_plot
fancy_plot_
floyd
from_3d_grid
from_adjacency
get_E
get_V
get_children
get_descendents
get_edges
get_height
get_vertices
get_weights
is_connected
isleaf
isroot
knn
leaves_of_a_subtree
left_incidence
list_of_neighbors
list_of_subtrees
main_cc
merge_simple_branches
mst
normalize
partition
plot
plot2
plot_height
propagate_upward
propagate_upward_and
remove_edges
remove_trivial_edges
reorder
reorder_from_leaves_to_roots
right_incidence
rooted_subtree
set_edges
set_euclidian
set_gaussian
set_height
set_weights
show
skeleton
split
subforest
subgraph
symmeterize
to_coo_matrix
to_neighb
tree_depth
__init__(V, parents=None, height=None)
Parameters :

V: the number of edges of the graph :

parents=None: array of shape (V) :

the parents of the graph by default, the parents are set to range(V), i.e. each node is its own parent, and each node is a tree

height=None: array of shape(V) :

the height of the nodes

Kruskal()

Creates the Minimum Spanning Tree self using Kruskal’s algo. efficient is self is sparse

Returns :

K: WeightedGraph instance :

the resulting MST

Kruskal_dev()

Creates the Minimum Spanning Tree self using Kruskal’s algo. efficient is self is sparse

Returns :

K: WeightedGraph instance :

the resulting MST

Voronoi_Labelling(seed)

label = self.Voronoi_Labelling(seed) performs a voronoi labelling of the graph

Parameters :

seed array of shape (nseeds), type (np.int), :

vertices from which the cells are built

Returns :

- labels : array of shape (self.V) the labelling of the vertices

fixme: how is dealt the case of diconnected graph ? :

Voronoi_diagram(seeds, samples)

Defines the graph as the Voronoi diagram (VD) that links the seeds. The VD is defined using the sample points.

Parameters :

seeds: array of shape (self.V,dim) :

samples: array of shape (nsamples,dim) :

WeightedDegree(c)
returns the sum of weighted degree of graph self
Parameters :

c (int): side selection :

if c==0 considering left side if c==1 considering right side of the edges

Returns :

wd : array of shape (self.V),

the resulting weighted degree

Note: slow implementation

adjacency()

Create the adjacency matrix of self

Returns :

A : an ((self.V*self.V),np.double) array

adjacency matrix of the graph

all_distances(seed=None)

returns all the distances of the graph as a tree

Parameters :

seed=None array of shape(nbseed) with valuesin [0..self.V-1] :

set of vertices from which tehe distances are computed

Returns :

dg: array of shape(nseed, self.V): the resulting distance :

anti_symmeterize()

self.anti_symmeterize() anti-symmeterize the self , ie produces the graph whose adjacency matrix would be the antisymmetric part of its current adjacency matrix

cc()

Returns an array of labels corresponding to the different connex components of the graph.

Returns :label: array of shape(self.V), labelling of the vertices :
check()

Check that the proposed is indeed a graph, i.e. contains no loop

Returns :a boolean b=0 iff there are loops, 1 otherwise :
check_compatible_height()

Check that height[parents[i]]>=height[i] for all nodes

cliques()

Extraction of the graphe cliques these are defined using replicator dynamics equations

Returns :

- cliques: array of shape (self.V), type (np.int) :

labelling of the vertices according to the clique they belong to

complete() makes self a complete graph (i.e. each pair of vertices is an edge)
compute_children()

self.compute_children() define the children list

Returns :

children: a list of self.V lists, :

that yields the children of each node

converse_edge()

Returns the index of the edge (j,i) for each edge (i,j) Note: a C implementation might be necessary

copy()

returns a copy of self

cut_redundancies()

self.cut_redudancies() Remove possibly redundant edges: if an edge (ab) is present twice in the edge matrix, only the first instance in kept. The weights are processed accordingly

Returns :- E(int): the number of edges, self.E :
define_graph_attributes()

define the edge and weights array

degrees()

Returns the degree of the graph vertices.

Returns :

rdegree: (array, type=int, shape=(self.V,)), the right degrees :

ldegree: (array, type=int, shape=(self.V,)), the left degrees :

depth_from_leaves()

compute a labelling of the nodes which is 0 for the leaves, 1 for their parents etc and maximal for the roots

Returns :depth: array of shape (self.V): the depth values of the vertices :
dijkstra(seed=0)

returns all the [graph] geodesic distances starting from seed it is mandatory that the graph weights are non-negative

Parameters :

seed (int, >-1,<self.V) or array of shape(p) :

edge(s) from which the distances are computed

Returns :

dg: array of shape (self.V) , :

the graph distance dg from ant vertex to the nearest seed

eps(X, eps=1.0)

Sets the graph to be the eps-nearest-neighbours graph of the data

Parameters :

X: array of shape (self.V) or (self.V,p) :

where p = dimension of the features data used for eps-neighbours computation

eps=1. (float), the neighborhood width :

Returns :

self.E the number of edges of the resulting graph :

fancy_plot(validleaves)

Idem plot, but the valid edges are enahanced

fancy_plot_(valid)

Idem plot, but the valid edges are enahanced

Returns :ax, the axes handle of the plot :
floyd(seed=None)

Compute all the geodesic distances starting from seeds it is mandatory that the graph weights are non-negative

Parameters :

seed= None: array of shape (nbseed), type np.int :

vertex indexes from which the distances are computed if seed==None, then every edge is a seed point

Returns :

dg array of shape (nbseed,self.V) :

the graph distance dg from each seed to any vertex

from_3d_grid(xyz, k=18)

Sets the graph to be the topological neighbours graph of the three-dimensional coordinates set xyz, in the k-connectivity scheme

Parameters :

xyz: array of shape (self.V, 3) and type np.int, :

k = 18: the number of neighbours considered. (6, 18 or 26) :

Returns :

E(int): the number of edges of self :

from_adjacency(A)

sets the edges of self according to the adjacency matrix M

Parameters :M: array of shape(sef.V, self.V) :
get_E()

To get the number of edges in the graph

get_V()

To get the number of vertices in the graph

get_children(v=-1)

returns the list list of children arrays in all the forest if v==-1 or the children of v otherwise

get_descendents(v)

returns the nodes that are children of v

get_edges()

To get the graph’s edges

get_height()

get the height array

get_vertices()

To get the graph’s vertices (as id)

get_weights()
is_connected()

States whether self is connected or not

isleaf()

returns a bool array of shape(self.V) so that isleaf==1 iff the node is a leaf in the forest (has no kids)

isroot()

returns a bool array of shape(self.V) so that isleaf==1 iff the node is a root in the forest i.e. : is its own parent

knn(X, k=1)

E = knn(X, k) set the graph to be the k-nearest-neighbours graph of the data

Parameters :

X array of shape (self.V) or (self.V,p) :

where p = dimension of the features data used for eps-neighbours computation

k=1 : is the number of neighbours considered

Returns :

- self.E (int): the number of edges of the resulting graph :

leaves_of_a_subtree(ids, custom=False)

tests whether the given nodes within ids represent all the leaves of a certain subtree of self

Parameters :

idds: array of shape (n) that takes values in [0..self.V-1] :

custom == False, boolean :

if custom==true the behavior of the function is more specific - the different connected components are considered as being in a same greater tree - when a node has more than two subbranches, any subset of these children is considered as a subtree

left_incidence()
Returns :

the left incidence matrix of self :

as a list of lists: i.e. the list[[e.0.0,..,e.0.i(0)],..,[e.V.0,E.V.i(V)]] where e.i.j is the set of edge indexes so that e.i.j[0] = i

list_of_neighbors()

returns the set of neighbors of self as a list of arrays

list_of_subtrees()

returns the list of all non-trivial subtrees in the graph Caveat: theis function assumes that the vertices are sorted in a way such that parent[i]>i forall i Only the leaves are listeed, not the subtrees themselves

main_cc()

Returns the indexes of the vertices within the main cc

Returns :idx: array of shape (sizeof main cc) :
merge_simple_branches()

merge the branches of the forest that are the only child of the parent branch into their child

mst(X)

makes self the MST of the array X

Parameters :

X: an array of shape (self.V,dim) :

p is the feature dimension of X

Returns :

tl (float) the total length of the mst :

normalize(c=0)

Normalize the graph according to the index c Normalization means that the sum of the edges values that go into or out each vertex must sum to 1

Parameters :

c=0 in {0,1,2}, optional: index that designates the way :

according to which D is normalized c == 0 => for each vertex a, sum{edge[e,0]=a} D[e]=1 c == 1 => for each vertex b, sum{edge[e,1]=b} D[e]=1 c == 2 => symmetric (‘l2’) normalization

partition(threshold)

partition the tree according to a cut criterion

plot(ax=None)

Plot the dendrogram associated with self the rank of the data in the dendogram is returned

Parameters :ax: axis handle, optional :
Returns :ax, the axis handle :
plot2(addNodes=False, font_size=10, cl_size=None)

Plot the dendrogram associated with self

plot_height()

plot the height of the non-leaves nodes

propagate_upward(label)

label = self.propagate_upward(label) Assuming that label is a certain positive integer field (i.e. labels) that is defined at the leaves of the tree and can be compared, this propagates these labels to the parents whenever the children nodes have coherent properties otherwise the parent value is unchanged

Parameters :label: array of shape(self.V) :
Returns :label: array of shape(self.V) :
propagate_upward_and(prop)

propagates some binary property in the forest that is defined in the leaves so that prop[parents] = logical_and(prop[children])

Parameters :prop, array of shape(self.V), the input property :
Returns :prop, array of shape(self.V), the output property field :
remove_edges(valid)

Removes all the edges for which valid==0

Parameters :valid, an array of shape (self.E) :
remove_trivial_edges()

Removes trivial edges, i.e. edges that are (vv)-like self.weights and self.E are corrected accordingly

Returns :- self.E (int): The number of edges :
reorder(c=0)

Reorder the graph according to the index c

Parameters :

c=0 in {0,1,2}, index that designates the array :

according to which the vectors are jointly reordered c == 0 => reordering makes edges[:,0] increasing,

and edges[:,1] increasing for edges[:,0] fixed

c == 1 => reordering makes edges[:,1] increasing,

and edges[:,0] increasing for edges[:,1] fixed

c == 2 => reordering makes weights increasing

reorder_from_leaves_to_roots()

reorder the tree so that the leaves come first then their parents and so on, and the roots are last the permutation necessary to apply to all vertex-based information

Returns :

order: array of shape(self.V) :

the order of the old vertices in the reordered graph

right_incidence()
Returns :

the right incidence matrix of self :

as a list of lists: i.e. the list[[e.0.0,..,e.0.i(0)],..,[e.V.0,E.V.i(V)]] where e.i.j is the set of edge indexes so that e.i.j[1] = i

rooted_subtree(k)

l = self.subtree(k) returns an array of the nodes included in the subtree rooted in k

Parameters :k (int): the vertex from which the subtree is searched :
Returns :idx : array of shape>=1 the index of the nodes beneath k
set_edges(edges)

Sets the graph’s edges

set_euclidian(X)

Compute the weights of the graph as the distances between the corresponding rows of X, which represents an embdedding of self

Parameters :

X array of shape (self.V, edim), :

the coordinate matrix of the embedding

set_gaussian(X, sigma=0)

Compute the weights of the graph as a gaussian function of the dinstance between the corresponding rows of X, which represents an embdedding of self

Parameters :

X array of shape (self.V,dim) :

the coordinate matrix of the embedding

sigma=0, float : the parameter of the gaussian function

set_height(height=None)

set the height array

set_weights(weights)
Parameters :weights : an array of shape(self.V), edges weights
show(X=None, ax=None)

a = self.show(X=None) plots the current graph in 2D

Parameters :

X=None, array of shape (self.V,2) :

a set of coordinates that can be used to embed the vertices in 2D. if X.shape[1]>2, a svd reduces X for display By default, the graph is presented on a circle

ax: ax handle, optional :

Returns :

ax: axis handle :

skeleton()

returns a MST that based on self.weights Note: self must be connected

split(k)

idem as partition, but a number of components are supplied instead

subforest(valid)

creates a subforest with the vertices for which valid>0 and with the correponding set of edges the children of deleted vertices become their own parent

Parameters :valid: array of shape (self.V) :
Returns :a new forest instance :
subgraph(valid)

Creates a subgraph with the vertices for which valid>0 and with the correponding set of edges

Parameters :valid array of shape (self.V): nonzero for vertices to be retained :
Returns :G WeightedGraph instance, the desired subgraph of self :
symmeterize()

symmeterize the graphself , ie produces the graph whose adjacency matrix would be the symmetric part of its current adjacency matrix

to_coo_matrix()
Returns :

sp: scipy.sparse matrix instance, :

that encodes the adjacency matrix of self

to_neighb()

converts the graph to a neighboring system The neighboring system is nothing but a (sparser) representation of the edge matrix

Returns :

ci, ne, we: arrays of shape (self.V+1), (self.E), (self.E) :

such that self.edges, self.weights is coded such that: for j in [ci[a] ci[a+1][, there exists en edge e so that (edge[e,0]=a,edge[e,1]=ne[j],self.weights[e] = we[j])

tree_depth()

return the maximal depth of any node in the tree

Functions

Average link clustering based on a pairwise distance matrix.

Parameters :

D array of shape (nbitem,nbitem) with nonnegative values :

distance matrix between some data items

verbose=0, verbosity level :

Returns :

t a weightForest structure that represents the dendrogram of the data :

Average link clustering based on a pairwise distance matrix.

Parameters :

D: a (n,n) distance matrix between some items :

stop=-1: stopping criterion, i.e. distance threshold at which :

further merges are forbidden

By default, all merges are performed

qmax = 1; the number of desired clusters :

(in the limit of stop)

verbose=0, verbosity level :

Returns :

u: a labelling of the graph vertices according to the criterion :

cost the cost of each merge step during the clustering procedure :

Average link clustering based on data matrix.

Parameters :

X array of shape (nbitem,dim): data matrix :

from which an Euclidian distance matrix is computed

verbose=0, verbosity level :

Returns :

t a weightForest structure that represents the dendrogram of the data :

Agglomerative function based on a (hopefully sparse) similarity graph

Parameters :G the input graph :
Returns :t a weightForest structure that represents the dendrogram of the data :

Agglomerative function based on a (hopefully sparse) similarity graph

Parameters :

G the input graph :

stop=0: the stopping criterion :

qmax=1: the number of desired clusters :

(in the limit of the stopping criterion)

Returns :

u: array of shape (G.V) :

a labelling of the graph vertices according to the criterion

cost: array of shape (G.V (?)) :

the cost of each merge step during the clustering procedure

nipy.neurospin.clustering.hierarchical_clustering.fusion(K, pop, i, j, k) modifies the graph K to merge nodes i and j into nodes k the similarity values are weighted averaged, where pop[, i], and pop[, j], yield the relative weights. this is used in average_link_slow (deprecated)

maximum link clustering based on a pairwise distance matrix.

Parameters :

D: array of shape (n,n) :

distance matrix between data items

verbose=0, verbosity level :

Returns :

t a weightForest structure that represents the dendrogram of the data :

maximum link clustering based on a pairwise distance matrix.

Returns :

u: array of shape (nbitems) :

a labelling of the graph vertices according to the criterion

cost: array of shape (nbitems-1) :

the cost of each merge step during the clustering procedure

Maximum link clustering based on data matrix.

Parameters :

X: array of shape (nbitem,dim) :

each row corresponds to a point to cluster

verbose=0, verbosity level :

Returns :

t a weightForest structure that represents the dendrogram of the data :

nipy.neurospin.clustering.hierarchical_clustering.ward(G, feature, verbose=0)

Agglomerative function based on a topology-defining graph and a feature matrix.

Returns :t: a WeightedForest structure that represents the dendrogram :
nipy.neurospin.clustering.hierarchical_clustering.ward_field_segment(F, stop=-1, qmax=-1, verbose=0)

Agglomerative function based on a field structure

Parameters :

F the input field (graph+feature) :

stop = -1: the stopping crterion :

if stop==-1, then no stopping criterion is used

qmax=1: the maximum number of desired clusters :

(in the limit of the stopping criterion)

Returns :

u: array of shape (F.V) :

labelling of the graph vertices according to the criterion

cost array of shape (F.V-1) :

the cost of each merge step during the clustering procedure

nipy.neurospin.clustering.hierarchical_clustering.ward_quick(G, feature, verbose=0)

Agglomerative function based on a topology-defining graph and a feature matrix.

Parameters :

G graph instance, :

topology-defining graph

feature: array of shape (G.V,dim_feature): :

some vectorial information related to the graph vertices

Returns :

t: weightForest instance, :

that represents the dendrogram of the data

nipy.neurospin.clustering.hierarchical_clustering.ward_quick_segment(G, feature, stop=-1, qmax=1, verbose=0)

Agglomerative function based on a topology-defining graph and a feature matrix.

Parameters :

G: neurospin.graph.WeightedGraph instance :

the input graph (a topological graph essentially)

feature array of shape (G.V,dim_feature) :

vectorial information related to the graph vertices

stop = -1: the stopping crterion :

if stop==-1, then no stopping criterion is used

qmax=1: the maximum number of desired clusters :

(in the limit of the stopping criterion)

Returns :

u: array of shape (G.V) :

labelling of the graph vertices according to the criterion

cost: array of shape (G.V-1) :

the cost of each merge step during the clustering procedure

nipy.neurospin.clustering.hierarchical_clustering.ward_segment(G, feature, stop=-1, qmax=1, verbose=0)

Agglomerative function based on a topology-defining graph and a feature matrix.

Parameters :

G the input graph (a topological graph essentially) :

feature array of shape (G.V,dim_feature) :

some vectorial information related to the graph vertices

stop = -1: the stopping crterion :

if stop==-1, then no stopping criterion is used

qmax=1: the maximum number of desired clusters :

(in the limit of the stopping criterion)

Returns :

u: array of shape (G.V): :

a labelling of the graph vertices according to the criterion

cost: array of shape (G.V-1) :

the cost of each merge step during the clustering procedure

NOTE: :

A euclidean distance is used in the feature space :

caveat : when the number of cc in G (nbcc)

is greter than qmax, u contains nbcc values, not qmax !

nipy.neurospin.clustering.hierarchical_clustering.ward_simple(feature, verbose=0)

Ward clustering based on a Feature matrix

Returns :t a weightForest structure that represents the dendrogram of the data :