A Bayesian Alternative to Mutual Information for the Hierarchical Clustering of Dependent Random Variables

The use of mutual information as a similarity measure in agglomerative hierarchical clustering (AHC) raises an important issue: some correction needs to be applied for the dimensionality of variables. In this work, we formulate the decision of merging dependent multivariate normal variables in an AHC procedure as a Bayesian model comparison. We found that the Bayesian formulation naturally shrinks the empirical covariance matrix towards a matrix set a priori (e.g., the identity), provides an automated stopping rule, and corrects for dimensionality using a term that scales up the measure as a function of the dimensionality of the variables. Also, the resulting log Bayes factor is asymptotically proportional to the plug-in estimate of mutual information, with an additive correction for dimensionality in agreement with the Bayesian information criterion. We investigated the behavior of these Bayesian alternatives (in exact and asymptotic forms) to mutual information on simulated and real data. An encouraging result was first derived on simulations: the hierarchical clustering based on the log Bayes factor outperformed off-the-shelf clustering techniques as well as raw and normalized mutual information in terms of classification accuracy. On a toy example, we found that the Bayesian approaches led to results that were similar to those of mutual information clustering techniques, with the advantage of an automated thresholding. On real functional magnetic resonance imaging (fMRI) datasets measuring brain activity, it identified clusters consistent with the established outcome of standard procedures. On this application, normalized mutual information had a highly atypical behavior, in the sense that it systematically favored very large clusters. These initial experiments suggest that the proposed Bayesian alternatives to mutual information are a useful new tool for hierarchical clustering.

[1]  Christian Windischberger,et al.  Toward discovery science of human brain function , 2010, Proceedings of the National Academy of Sciences.

[2]  Alan C. Evans,et al.  Automatic "pipeline" analysis of 3-D MRI data for clinical trials: application to multiple sclerosis , 2002, IEEE Transactions on Medical Imaging.

[3]  Bruno O. Shubert,et al.  Random variables and stochastic processes , 1979 .

[4]  Jorge Sepulcre,et al.  Evidence from intrinsic activity that asymmetry of the human brain is controlled by multiple factors , 2009, Proceedings of the National Academy of Sciences.

[5]  Mark W. Woolrich,et al.  Network modelling methods for FMRI , 2011, NeuroImage.

[6]  Alan C. Evans,et al.  Multi-level bootstrap analysis of stable clusters in resting-state fMRI , 2009, NeuroImage.

[7]  David R. Anderson,et al.  Multimodel Inference , 2004 .

[8]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[9]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  John G. Proakis,et al.  Probability, random variables and stochastic processes , 1985, IEEE Trans. Acoust. Speech Signal Process..

[11]  Hans-Hermann Bock,et al.  Classification and Related Methods of Data Analysis , 1988 .

[12]  Heng Lian Shrinkage tuning parameter selection in precision matrices estimation , 2009 .

[13]  H. Akaike A new look at the statistical model identification , 1974 .

[14]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[15]  P. Rousseeuw,et al.  Wiley Series in Probability and Mathematical Statistics , 2005 .

[16]  Solomon Kullback,et al.  Information Theory and Statistics , 1970, The Mathematical Gazette.

[17]  Abraham Z. Snyder,et al.  Spurious but systematic correlations in functional connectivity MRI networks arise from subject motion , 2012, NeuroImage.

[18]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[19]  N. Tzourio-Mazoyer,et al.  Automated Anatomical Labeling of Activations in SPM Using a Macroscopic Anatomical Parcellation of the MNI MRI Single-Subject Brain , 2002, NeuroImage.

[20]  M. P. van den Heuvel,et al.  Normalized Cut Group Clustering of Resting-State fMRI Data , 2008, PloS one.

[21]  Oded Maimon,et al.  Evaluation of gene-expression clustering via mutual information distance measure , 2007, BMC Bioinformatics.

[22]  Bill Ravens,et al.  An Introduction to Copulas , 2000, Technometrics.

[23]  Xiao-Li Meng,et al.  Modeling covariance matrices in terms of standard deviations and correlations, with application to shrinkage , 2000 .

[24]  Katherine A. Heller,et al.  Bayesian hierarchical clustering , 2005, ICML.

[25]  Habib Benali,et al.  Large-Sample Asymptotic Approximations for the Sampling and Posterior Distributions of Differential Entropy for Multivariate Normal Distributions , 2011, Entropy.

[26]  Anders Lansner,et al.  A Novel Model-Free Data Analysis Technique Based on Clustering in a Mutual Information Space: Application to Resting-State fMRI , 2010, Front. Syst. Neurosci..

[27]  Habib Benali,et al.  Regions, systems, and the brain: Hierarchical measures of functional integration in fMRI , 2008, Medical Image Anal..

[28]  Jean-Baptiste Poline,et al.  Which fMRI clustering gives good brain parcellations? , 2014, Front. Neurosci..

[29]  David G. Stork,et al.  Pattern Classification , 1973 .

[30]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[31]  Marisa O. Hollinshead,et al.  The organization of the human cerebral cortex estimated by intrinsic functional connectivity. , 2011, Journal of neurophysiology.

[32]  Emiliano Macaluso,et al.  Images-based suppression of unwanted global signals in resting-state functional connectivity studies. , 2009, Magnetic resonance imaging.

[33]  Habib Benali,et al.  Asymptotic Bayesian structure learning using graph supports for Gaussian graphical models , 2006 .

[34]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[35]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[36]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[37]  I. Miller Probability, Random Variables, and Stochastic Processes , 1966 .

[38]  Joaquín Goñi,et al.  Multi-scale integration and predictability in resting state brain activity , 2014, Front. Neuroinform..

[39]  Julien Doyon,et al.  The Richness of Task-Evoked Hemodynamic Responses Defines a Pseudohierarchy of Functionally Meaningful Brain Networks. , 2015, Cerebral cortex.

[40]  Michael I. Jordan Proceedings of the NATO Advanced Study Institute on Learning in graphical models , 1998 .

[41]  S. J. Press,et al.  Applied multivariate analysis : using Bayesian and frequentist methods of inference , 1984 .

[42]  Zoubin Ghahramani,et al.  Accelerating Bayesian Hierarchical Clustering of Time Series Data with a Randomised Algorithm , 2013, PloS one.

[43]  A. McQuarrie,et al.  Regression and Time Series Model Selection , 1998 .

[44]  E. Bullmore,et al.  Neurophysiological architecture of functional magnetic resonance images of human brain. , 2005, Cerebral cortex.

[45]  David A. Binder,et al.  Approximations to Bayesian clustering rules , 1981 .

[46]  D. Collins,et al.  Automatic 3D Intersubject Registration of MR Volumetric Data in Standardized Talairach Space , 1994, Journal of computer assisted tomography.

[47]  Carsten O. Daub,et al.  The mutual information: Detecting and evaluating dependencies between variables , 2002, ECCB.

[48]  Alberto Roverato,et al.  Asymptotic Prior to Posterior Analysis for Graphical Gaussian Models , 1999 .

[49]  K. S. Kölbig,et al.  Errata: Milton Abramowitz and Irene A. Stegun, editors, Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, National Bureau of Standards, Applied Mathematics Series, No. 55, U.S. Government Printing Office, Washington, D.C., 1994, and all known reprints , 1972 .

[50]  Dorota Kurowicka,et al.  Dependence Modeling: Vine Copula Handbook , 2010 .

[51]  Jerry Nedelman,et al.  Book review: “Bayesian Data Analysis,” Second Edition by A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin Chapman & Hall/CRC, 2004 , 2005, Comput. Stat..

[52]  Patrik D'haeseleer,et al.  How does gene expression clustering work? , 2005, Nature Biotechnology.

[53]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[54]  M. Cugmas,et al.  On comparing partitions , 2015 .

[55]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[56]  T. W. Anderson An Introduction to Multivariate Statistical Analysis, 2nd Edition. , 1985 .

[57]  Yang Xu,et al.  R/BHC: fast Bayesian hierarchical clustering for microarray data , 2009, BMC Bioinformatics.

[58]  David E. Booth,et al.  Applied Multivariate Analysis , 2003, Technometrics.

[59]  A. Scott,et al.  Clustering methods based on likelihood ratio criteria. , 1971 .

[60]  M. Studený,et al.  The Multiinformation Function as a Tool for Measuring Stochastic Dependence , 1998, Learning in Graphical Models.

[61]  Xin Chen,et al.  An information-based sequence distance and its application to whole mitochondrial genome phylogeny , 2001, Bioinform..

[62]  M. Tribus,et al.  Probability theory: the logic of science , 2003 .

[63]  Michael Satosi Watanabe,et al.  Information Theoretical Analysis of Multivariate Correlation , 1960, IBM J. Res. Dev..

[64]  Roger L. Freeman Wiley Series in Telecommunications and Signal Processing , 2005 .

[65]  Timothy O. Laumann,et al.  Functional Network Organization of the Human Brain , 2011, Neuron.

[66]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[67]  J. Schmee An Introduction to Multivariate Statistical Analysis , 1986 .

[68]  N. Rajpoot,et al.  Bayesian Hierarchical Clustering for Studying Cancer Gene Expression Data with Unknown Statistics , 2013, PloS one.

[69]  R. Tibshirani,et al.  An introduction to the bootstrap , 1993 .

[70]  V. Batagelj Generalized Ward and Related Clustering Problems ∗ , 1988 .

[71]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[72]  Damir Kalpic,et al.  Multivariate Normal Distributions , 2011, International Encyclopedia of Statistical Science.

[73]  A. Brix Bayesian Data Analysis, 2nd edn , 2005 .

[74]  Otto Opitz,et al.  Classification and Data Analysis , 1999 .

[75]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[76]  Marcia Davenport,et al.  East Side, West Side , 1947 .

[77]  Paul D. W. Kirk,et al.  Bayesian hierarchical clustering for microarray time series data with replicates and outlier measurements , 2011, BMC Bioinformatics.

[78]  D. Louis Collins,et al.  Unbiased average age-appropriate atlases for pediatric studies , 2011, NeuroImage.

[79]  Jr. Earl Glen Whitehead,et al.  Combinatorial Algorithms for Computers and Calculators; 2nd Edition (Albert Nijenhuis and Herbert S. Wilf) , 1980 .

[80]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[81]  Klaus Lehnertz,et al.  Hierarchical mutual information clustering for an improved classification of fMRI data , 2008 .

[82]  K. Lehnertz,et al.  73. Hierarchical mutual information clustering for an improved classification of fMRI data , 2009, Clinical Neurophysiology.

[83]  Joan R. Erback East Side, West Side , 1963 .

[84]  G. Edelman,et al.  A measure for brain complexity: relating functional segregation and integration in the nervous system. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[85]  Xiaobo Zhou,et al.  Gene Clustering Based on Clusterwide Mutual Information , 2004, J. Comput. Biol..

[86]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[87]  Michael P. Milham,et al.  A convergent functional architecture of the insula emerges across imaging modalities , 2012, NeuroImage.

[88]  Evgueni A. Haroutunian,et al.  Information Theory and Statistics , 2011, International Encyclopedia of Statistical Science.

[89]  Alan C. Evans,et al.  The pipeline system for Octave and Matlab (PSOM): a lightweight scripting framework and execution engine for scientific workflows , 2012, Front. Neuroinform..

[90]  Alexander Kraskov,et al.  Published under the scientific responsability of the EUROPEAN PHYSICAL SOCIETY Incorporating , 2002 .

[91]  J. Overall,et al.  Applied multivariate analysis , 1983 .

[92]  H. Joe Relative Entropy Measures of Multivariate Dependence , 1989 .

[93]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[94]  Alexander Kraskov,et al.  MIC: Mutual Information Based Hierarchical Clustering , 2008, 0809.1605.

[95]  Katherine A. Heller,et al.  Efficient Bayesian methods for clustering. , 2008 .

[96]  Albert Nijenhuis,et al.  Combinatorial Algorithms for Computers and Calculators , 1978 .