Stochastic Data Clustering

In 1961 Herbert Simon and Albert Ando [Econometrika, 29 (1961), pp. 111--138] published the theory behind the long-term behavior of a dynamical system that can be described by a nearly uncoupled matrix. Over the past fifty years this theory has been used in a variety of contexts, including queueing theory, brain organization, and ecology. In all of these applications, the structure of the system is known and the point of interest is the various stages the system passes through on its way to some long-term equilibrium. This paper looks at this problem from the other direction. That is, we develop a technique for using the evolution of the system to tell us about its initial structure, and then use this technique to develop an algorithm that takes the varied solutions from multiple data clustering algorithms to arrive at a single data clustering solution.

[1]  Daniel Ruiz,et al.  Using a Global Parameter for Gaussian Affinity Matrices in Spectral Clustering , 2008, VECPAR.

[2]  Michael W. Berry,et al.  Understanding search engines: mathematical modeling and text retrieval (software , 1999 .

[3]  Ole Winther,et al.  Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm , 2006, Bioinform..

[4]  J. Dixon Estimating Extremal Eigenvalues and Condition Numbers of Matrices , 1983 .

[5]  Vladimir Filkov,et al.  Consensus Clustering Algorithms: Comparison and Refinement , 2008, ALENEX.

[6]  Ana L. N. Fred,et al.  Data clustering using evidence accumulation , 2002, Object recognition supported by user interaction for service robots.

[7]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[8]  Olaf Sporns,et al.  Connectivity and complexity: the relationship between neuroanatomy and brain dynamics , 2000, Neural Networks.

[9]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[10]  Ryan M. Tifenbach,et al.  ON AN SVD-BASED ALGORITHM FOR IDENTIFYING META-STABLE STATES OF MARKOV CHAINS , 2011 .

[11]  H. Simon,et al.  Near decomposability and the speed of evolution , 2002 .

[12]  B. Parlett The Symmetric Eigenvalue Problem , 1981 .

[13]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[14]  Philip A. Knight,et al.  The Sinkhorn-Knopp Algorithm: Convergence and Applications , 2008, SIAM J. Matrix Anal. Appl..

[15]  Ilse C. F. Ipsen,et al.  Improving the Accuracy of Inverse Iteration , 1992, SIAM J. Sci. Comput..

[16]  Carl D. Meyer,et al.  Stochastic Complementation, Uncoupling Markov Chains, and the Theory of Nearly Reducible Systems , 1989, SIAM Rev..

[17]  William W. Cohen,et al.  A Very Fast Method for Clustering Big Text Datasets , 2010, ECAI.

[18]  Robert J. Plemmons,et al.  Nonnegative Matrices in the Mathematical Sciences , 1979, Classics in Applied Mathematics.

[19]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[20]  S. Dongen Graph clustering by flow simulation , 2000 .

[21]  Carl D. Meyer,et al.  On the structure of stochastic matrices with a subdominant eigenvalue near 1 , 1998 .

[22]  Richard Sinkhorn A Relationship Between Arbitrary Positive Matrices and Doubly Stochastic Matrices , 1964 .

[23]  Steven Skiena,et al.  Heterogeneous Data Integration with the Consensus Clustering Formalism , 2004, DILS.

[24]  R. Brualdi,et al.  The diagonal equivalence of a nonnegative matrix to a stochastic matrix , 1966 .

[25]  Tim Chartier,et al.  A Nonnegative Analysis of Politics , 2011 .

[26]  V. Rich Personal communication , 1989, Nature.

[27]  Nir Ailon,et al.  Aggregating inconsistent information: Ranking and clustering , 2008 .

[28]  Robert Bartle,et al.  The Elements of Real Analysis , 1977, The Mathematical Gazette.

[29]  Aristides Gionis,et al.  Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[30]  P. Deuflhard,et al.  Robust Perron cluster analysis in conformation dynamics , 2005 .

[31]  Herbert A. Simon,et al.  Aggregation of Variables in Dynamic Systems , 1961 .

[32]  Nikos A. Salingaros,et al.  Complexity and Urban Coherence , 2000 .

[33]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[34]  R. Macarthur The Problem of Pattern and Scale in Ecology: The Robert H. MacArthur Award Lecture , 2005 .

[35]  William W. Cohen,et al.  Power Iteration Clustering , 2010, ICML.

[36]  Shaina L. Race Data Clustering via Dimension Reduction and Algorithm Aggregation , 2008 .

[37]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[38]  Richard Sinkhorn,et al.  Concerning nonnegative matrices and doubly stochastic matrices , 1967 .

[39]  Anita L. Feller Understanding Search Engines , 2012 .

[40]  S. Levin The problem of pattern and scale in ecology , 1992 .

[41]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[42]  Steven Skiena,et al.  Integrating microarray data by consensus clustering , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[43]  P. Deuflhard,et al.  Identification of almost invariant aggregates in reversible nearly uncoupled Markov chains , 2000 .

[44]  J. Csima,et al.  The DAD Theorem for Symmetric Non-negative Matrices , 1972, J. Comb. Theory, Ser. A.

[45]  Jianbo Shi,et al.  A Random Walks View of Spectral Segmentation , 2001, AISTATS.

[46]  Ana L. N. Fred,et al.  Robust data clustering , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[47]  Enrique H. Ruspini,et al.  Numerical methods for fuzzy clustering , 1970, Inf. Sci..

[48]  William J. Stewart,et al.  Introduction to the numerical solution of Markov Chains , 1994 .

[49]  P.-J. Courtois,et al.  Decomposability: Queueing and Computer System Applications , 2014 .

[50]  Martin Nilsson Jacobi,et al.  A Robust Spectral Method for Finding Lumpings and Meta Stable States of Non-Reversible Markov Chains , 2008, 0810.1127.

[51]  Volker Mehrmann,et al.  An SVD approach to identifying metastable states of Markov chains. , 2007 .

[52]  Carl D. Meyer,et al.  Matrix Analysis and Applied Linear Algebra , 2000 .