Table Of Contents

Previous topic

neurospin.clustering.clustering

Next topic

neurospin.clustering.hierarchical_clustering

This Page

neurospin.clustering.gmm

Module: neurospin.clustering.gmm

Inheritance diagram for nipy.neurospin.clustering.gmm:

Gaussian Mixture Model Class: contains the basic fields and methods of GMMs the high level functions are/should be binded in C

Author : Bertrand Thirion, 2006-2009

Classes

BGMM

class nipy.neurospin.clustering.gmm.BGMM(k=1, dim=1, prec_type=1, centers=None, precision=None, weights=None)

Bases: nipy.neurospin.clustering.gmm.GMM

This class implements Bayesian diagonal GMMs (prec_type = 1) Besides the standard fiels of GMMs, this class contains the follwing fields - prior_centers : array of shape (k,dim): the prior on the components means - prior_precision : array of shape (k,dim): the prior on the components precisions - prior_dof : array of shape (k): the prior on the dof (should be at least equal to dim) - prior_mean_scale : array of shape (k): scaling factor of the prior precision on the mean - prior_weights : array of shape (k) the prior on the components weights - mean_scale : array of shape (k): scaling factor of the posterior precision on the mean - dof : array of shape (k): the posterior dofs

__init__(k=1, dim=1, prec_type=1, centers=None, precision=None, weights=None)
Gibbs_estimate(X, niter=1000, method=1)
Estimation of the BGMM using Gibbs sampling INPUT: - X array of shape (nbitems,dim) the input data - niter = 1000, the maximal number of iterations of the Gibbs sampling - method = 1: boolean to state whether covariance are fixed (0 ; normal model) or variable (1 ; normal-wishart model) OUTPUT: - label: array of shape nbitems: resulting MAP labelling
Gibbs_estimate_and_sample(X, niter=1000, method=1, gd=None, nsamp=1000, verbose=0)
Estimation of the BGMM using Gibbs sampling and sampling of the posterior on test points INPUT: - X array of shape (nbitems,dim) the input data - niter = 1000, the maximal number of iterations of the Gibbs sampling - method = 1: boolean to state whether covariance are fixed (0 ; normal model) or variable (1 ; normal-wishart model) - gd = None, a grid descriptor, i.e. the grid on chich the model is sampled if gd==None, X is used as Grid - nsamp = 1000 number of draws of the posterior -verbose = 0: the verboseity level OUTPUT: - Li : array of shape (nbnodes): the average log-posterior - label: array of shape (nbitems): resulting MAP labelling
VB_estimate(X, niter=100, delta=0.0001)
Estimation of the BGMM using a Variational Bayes approach INPUT: - X array of shape (nbitems,dim) the input data - niter = 100, the maximal number of iterations of the VB algo - delta = 0.0001, the increment in log-likelihood to declare convergence OUTPUT: - label: array of shape nbitems: resulting MAP labelling
VB_estimate_and_sample(X, niter=1000, delta=0.0001, gd=None, verbose=0)
Estimation of the BGMM using a Variational Bayes approach, and sampling of the model on test points in order to have an estimate of the posterior on these points INPUT: - X array of shape (nbitems,dim) the input data - niter = 100, the maximal number of iterations of the VB algo - delta = 0.0001, the increment in log-likelihood to declare convergence - gd = None a grid descriptor, i.e. the grid on chich the model is sampled if gd==None, X is used as Grid - verbose = 0: the verbosity mode OUTPUT: - Li : array of shape (nbnodes): the average log-posterior - label: array of shape nbitems: resulting MAP labelling
VB_sample(gd, X=None)
Sampling of the BGMM model on test points (the ‘grid’)in order to have an estimate of the posterior on these points INPUT: - gd = a grid descriptor, i.e. the grid on chich the BGMM is sampled - X = None: used for plotting (empirical data) OUTPUT: - Li : array of shape (nbnodes,self.k): the posterior for each node and component
check_priors()
Check that the meain fields have correct dimensions
sample_on_data(grid)
Sampling of the BGMM model on test points (the ‘grid’)in order to have an estimate of the posterior on these points INPUT: - grid: a set of points from which the posterior should be smapled OUTPUT: - Li : array of shape (nbnodes,self.k): the posterior for each node and component
set_empirical_priors(X)
Set the prior in a natural (almost uninformative) fashion given a dataset X INPUT: - the BGMM priors
set_priors(prior_centers=None, prior_weights=None, prior_precision=None, prior_dof=None, prior_mean_scale=None)
Set the prior of the BGMM

GMM

class nipy.neurospin.clustering.gmm.GMM(k=1, dim=1, prec_type=1, centers=None, precision=None, weights=None)

This is the basic GMM class GMM.k is the number of components in the mixture GMM.dim is the dimension of the data GMM.centers is an array that contains all the centers of the components

shape (GMM.k,GMM.dim)
GMM.precision is an array that contains all the precision of the components
its shape varies according to GMM.prec_type

GMM.prec_type type of the precision matrix - O: full coavriance matrix, one for each component. shape = (GMM.k,GMM.dim**2) - 1 : diagonal covariance matrix, one for each components. shape = (GMM.k,GMM.dim) - 2 : diagonal covariance matrix, the same for all component. shape = (1,GMM.dim) GMM.weights contains the weights of the components in the mixture GMM.estimated is a binary variable that indicates whether the model has been instantiated or not

__init__(k=1, dim=1, prec_type=1, centers=None, precision=None, weights=None)
BIC(LL, n)
Computing the value of the BIC critrion of the current GMM, given its average log-likelihood LL
assess_divergence()
check()
Checking the shape of sifferent matrices involved in the model
check_data(data)
Checking that the data is in correct format
estimate(data, Labels=None, maxiter=300, delta=0.001, ninit=1)
Estimation of the GMM based on data and an EM algorithm INPUT data : (n*p) feature array, n = nb items, p=feature dimension Labels=None : prior labelling of the data (this may improve convergence) maxiter=300 : max number of iterations of the EM algorithm delta = 0.001 : criterion on the log-likelihood increments to declare converegence ninit=1 : number of possible iterations of the GMM estimation OUTPUT Labels : (n) array of type (‘i’) discrete labelling of the data items into clusters LL : (float) average log-likelihood of the data BIC : (float) associated BIC criterion
optimize_with_BIC(data, kvals=None, maxiter=300, delta=0.001, ninit=1, verbose=0)
Find the optimal GMM using BIC criterion. The method is run with all the values in kmax for k INPUT data : (n*p) feature array, n = nb items, p=feature dimension kvals=None : range of values for k. maxiter=300 : max number of iterations of the EM algorithm delta = 0.001 : criterion on the log-likelihood increments to declare converegence ninit=1 : number of possible iterations of the GMM estimation verbsose=0: verbosity mode OUTPUT Labels : (n) array of type (‘i’) discrete labelling of the data items into clusters LL : (float) average log-likelihood of the data BIC : (float) associated BIC criterion
partition(data)
Partitioning the data according to the gmm model INPUT data : (n*p) feature array, n = nb items, p=feature dimension OUTPUT Labels : (n) array of type (‘i’) discrete labelling of the data items into clusters LL : (n) array of type (‘d’) log-likelihood of the data
sample(gd, X, verbose=0)
Evaluating the GMM on some new data INPUT data : (n*p) feature array, n = nb items, p=feature dimension OUTPUT Labels : (n) array of type (‘i’) discrete labelling of the data items into clusters LL : (n) array of type (‘d’) log-likelihood of the data
set_k(k)
To set the value of k
show(X, gd, density=None, nbf=-1)
Function to plot a GMM -WIP Currently, works only in 1D and 2D
show_components(X, gd, density=None, nbf=-1)
Function to plot a GMM -WIP Currently, works only in 1D and 2D
test(data)

Evaluating the GMM on some new data INPUT data : (n*p) feature array, n = nb items, p=feature dimension OUTPUT

LL : (n) array of type (‘d’) log-likelihood of the data

grid_descriptor

class nipy.neurospin.clustering.gmm.grid_descriptor(dim=1)

A tiny class to handle cartesian grids

__init__(dim=1)
getinfo(lim, nbs)
make_grid()