Spectral Methods for Learning Multivariate Latent Tree Structure

This work considers the problem of learning the structure of multivariate linear tree models, which include a variety of directed tree graphical models with continuous, discrete, and mixed latent variables such as linear-Gaussian models, hidden Markov models, Gaussian mixture models, and Markov evolutionary trees. The setting is one where we only have samples from certain observed variables in the tree, and our goal is to estimate the tree structure (i.e., the graph of how the underlying hidden variables are connected to each other and to the observed variables). We propose the Spectral Recursive Grouping algorithm, an efficient and simple bottom-up procedure for recovering the tree structure from independent samples of the observed variables. Our finite sample size bounds for exact recovery of the tree structure reveal certain natural dependencies on underlying statistical and structural properties of the underlying joint distribution. Furthermore, our sample complexity guarantees have no explicit dependence on the dimensionality of the observed variables, making the algorithm applicable to many high-dimensional settings. At the heart of our algorithm is a spectral quartet test for determining the relative topology of a quartet of variables from second-order statistics.

[1]  J. Wishart SAMPLING ERRORS IN THE THEORY OF TWO FACTORS , 1928 .

[2]  M. Bartlett Further aspects of the theory of multiple regression , 1938, Mathematical Proceedings of the Cambridge Philosophical Society.

[3]  H. Kesten,et al.  Additional Limit Theorems for Indecomposable Multidimensional Galton-Watson Processes , 1966 .

[4]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[5]  R. Muirhead,et al.  Asymptotic distributions in canonical correlation analysis and other multivariate procedures for nonnormal populations , 1980 .

[6]  Judea Pearl,et al.  Structuring causal trees , 1986, J. Complex..

[7]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[8]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[9]  P. Erdös,et al.  A few logs suffice to build (almost) all trees (l): part I , 1997 .

[10]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[11]  Michael I. Jordan Graphical Models , 1998 .

[12]  Tandy J. Warnow,et al.  A Few Logs Suffice to Build (almost) All Trees: Part II , 1999, Theor. Comput. Sci..

[13]  Tandy J. Warnow,et al.  A few logs suffice to build (almost) all trees (I) , 1999, Random Struct. Algorithms.

[14]  Nir Friedman,et al.  Learning Bayesian Network Structure from Massive Datasets: The "Sparse Candidate" Algorithm , 1999, UAI.

[15]  A. Satorra Structural Equation Models with Latent Variables , 2002 .

[16]  T. Wan Structural Equation Models with Latent Variables , 2002 .

[17]  Elchanan Mossel Phase transitions in phylogeny , 2003, Transactions of the American Mathematical Society.

[18]  Hideyuki Tanaka,et al.  Subspace Identification of Linear Systems with Observation Outliers , 2004 .

[19]  David Maxwell Chickering,et al.  Large-Sample Learning of Bayesian Networks is NP-Hard , 2002, J. Mach. Learn. Res..

[20]  Elchanan Mossel,et al.  Evolutionary trees and the Ising model on the Bethe lattice: a proof of Steel’s conjecture , 2005, ArXiv.

[21]  Elchanan Mossel,et al.  Optimal phylogenetic reconstruction , 2005, STOC '06.

[22]  Joseph T. Chang,et al.  A signal-to-noise analysis of phylogeny estimation by neighbor-joining: Insufficiency of polynomial length sequences. , 2006, Mathematical biosciences.

[23]  Elchanan Mossel,et al.  Learning nonsingular phylogenies and hidden Markov models , 2005, Symposium on the Theory of Computing.

[24]  Sanjoy Dasgupta,et al.  A Probabilistic Analysis of EM for Mixtures of Separated, Spherical Gaussians , 2007, J. Mach. Learn. Res..

[25]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[26]  Sham M. Kakade,et al.  A spectral algorithm for learning Hidden Markov Models , 2008, J. Comput. Syst. Sci..

[27]  Sanjoy Dasgupta,et al.  Learning Mixtures of Gaussians using the k-means Algorithm , 2009, ArXiv.

[28]  C. Matias,et al.  Identifiability of parameters in latent structure models with many observed variables , 2008, 0809.5032.

[29]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[30]  Byron Boots,et al.  Reduced-Rank Hidden Markov Models , 2009, AISTATS.

[31]  Antonio Torralba,et al.  Exploiting hierarchical context on a large database of object categories , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Le Song,et al.  Hilbert Space Embeddings of Hidden Markov Models , 2010, ICML.

[33]  A. Willsky,et al.  Latent variable graphical model selection via convex optimization , 2010 .

[34]  Sham M. Kakade,et al.  Dimension-free tail inequalities for sums of random matrices , 2011, ArXiv.

[35]  Le Song,et al.  A Spectral Algorithm for Latent Tree Graphical Models , 2011, ICML.

[36]  Vincent Y. F. Tan,et al.  Learning Latent Tree Graphical Models , 2010, J. Mach. Learn. Res..