Learning High-Dimensional Mixtures of Graphical Models

We consider unsupervised estimation of mixtures of discrete graphical models, where the class variable corresponding to the mixture components is hidden and each mixture component over the observed variables can have a potentially different Markov graph structure and parameters. We propose a novel approach for estimating the mixture components, and our output is a tree-mixture model which serves as a good approximation to the underlying graphical model mixture. Our method is efficient when the union graph, which is the union of the Markov graphs of the mixture components, has sparse vertex separators between any pair of observed variables. This includes tree mixtures and mixtures of bounded degree graphs. For such models, we prove that our method correctly recovers the union graph structure and the tree structures corresponding to maximum-likelihood tree approximations of the mixture components. The sample and computational complexities of our method scale as $\poly(p, r)$, for an $r$-component mixture of $p$-variate graphical models. We further extend our results to the case when the union graph has sparse local separators between any pair of observed variables, such as mixtures of locally tree-like graphs, and the mixture components are in the regime of correlation decay.

[1]  Morroe Berger,et al.  Freedom and control in modern society , 1954 .

[2]  P. Lazarsfeld,et al.  Friendship as Social process: a substantive and methodological analysis , 1964 .

[3]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[4]  Paul F. Lazarsfeld,et al.  Latent Structure Analysis. , 1969 .

[5]  F. Krauss Latent Structure Analysis , 1980 .

[6]  L. L. Cam,et al.  Asymptotic Methods In Statistical Decision Theory , 1986 .

[7]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[8]  Christopher Meek,et al.  Learning Bayesian Networks with Discrete Variables from Data , 1995, KDD.

[9]  B. Lindsay Mixture models : theory, geometry, and applications , 1995 .

[10]  David Heckerman,et al.  Knowledge Representation and Inference in Similarity Networks and Bayesian Multinets , 1996, Artif. Intell..

[11]  Joseph T. Chang,et al.  Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. , 1996, Mathematical biosciences.

[12]  P. Erdös,et al.  A few logs suffice to build (almost) all trees (l): part I , 1997 .

[13]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[14]  Michael I. Jordan Graphical Models , 1998 .

[15]  Bo Thiesson,et al.  Computationally Efficient Methods For Selecting Among Mixtures Of Graphical Models, With Discussion , 1999 .

[16]  Sanjoy Dasgupta,et al.  Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[17]  Michael I. Jordan,et al.  Learning with Mixtures of Trees , 2001, J. Mach. Learn. Res..

[18]  John Odentrantz,et al.  Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues , 2000, Technometrics.

[19]  Nevin Lianwen Zhang,et al.  Hierarchical latent class models for cluster analysis , 2002, J. Mach. Learn. Res..

[20]  K. Kinzler,et al.  Cancer genes and the pathways they control , 2004, Nature Medicine.

[21]  Tomas Kocka,et al.  Efficient learning of hierarchical latent class models , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[22]  Elchanan Mossel,et al.  Learning nonsingular phylogenies and hidden Markov models , 2005, STOC '05.

[23]  Elchanan Mossel,et al.  Optimal phylogenetic reconstruction , 2005, STOC '06.

[24]  Dror Weitz,et al.  Counting independent sets up to the tree threshold , 2006, STOC '06.

[25]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[26]  Mike Steel,et al.  Phylogenetic mixtures on a single tree can mimic a tree of another topology. , 2007, Systematic biology.

[27]  Ohad Shamir,et al.  Learning and generalization with the information bottleneck , 2008, Theor. Comput. Sci..

[28]  Bin Yu,et al.  High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence , 2008, 0811.3628.

[29]  Tao Chen,et al.  Efficient Model Evaluation in the Search-Based Approach to Latent Structure Discovery , 2008 .

[30]  Elchanan Mossel,et al.  Reconstruction of Markov Random Fields from Samples: Some Observations and Algorithms , 2007, SIAM J. Comput..

[31]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[32]  Sham M. Kakade,et al.  A spectral algorithm for learning Hidden Markov Models , 2008, J. Comput. Syst. Sci..

[33]  Daphne Koller,et al.  Learning a Small Mixture of Trees , 2009, NIPS.

[34]  Lang Tong,et al.  A large-deviation analysis for the maximum likelihood learning of tree structures , 2009, 2009 IEEE International Symposium on Information Theory.

[35]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[36]  Helen Armstrong,et al.  Bayesian covariance matrix estimation using a mixture of decomposable graphical models , 2007, Stat. Comput..

[37]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[38]  Pablo A. Parrilo,et al.  Latent variable graphical model selection via convex optimization , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[39]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[40]  Sanjay Shakkottai,et al.  Greedy learning of Markov network structure , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[41]  Ankur Moitra,et al.  Settling the Polynomial Learnability of Mixtures of Gaussians , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[42]  Mikhail Belkin,et al.  Polynomial Learning of Distribution Families , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[43]  E. Levina,et al.  Joint estimation of multiple graphical models. , 2011, Biometrika.

[44]  Le Song,et al.  Spectral Methods for Learning Multivariate Latent Tree Structure , 2011, NIPS.

[45]  Ali Jalali,et al.  On Learning Discrete Graphical Models using Greedy Methods , 2011, NIPS.

[46]  Vincent Y. F. Tan,et al.  Learning Latent Tree Graphical Models , 2010, J. Mach. Learn. Res..

[47]  Elchanan Mossel,et al.  Phylogenetic mixtures: Concentration of measure in the large-tree limit , 2011, ArXiv.

[48]  Seth Sullivant,et al.  When do phylogenetic mixture models mimic other phylogenetic models? , 2012, Systematic biology.

[49]  Anima Anandkumar,et al.  Learning Loopy Graphical Models with Latent Variables: Efficient Methods and Guarantees , 2012, The Annals of Statistics.

[50]  Anima Anandkumar,et al.  A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.

[51]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..