Spectral methods for learning discrete latent tree models

We consider the problems of structure learning and parameter estimation for discrete latent tree models. For structure learning, we introduce a concept of generalized information distance between variables based on singular values of probability matrices, and use it to build a bottom-up algorithm for structure recovery. The algorithm is proved to be consistent. Moreover, a finite sample bound is given for exact structure recovery. For parameter estimation, we suggest a novel matrix decomposition algorithm for the case when every latent variable has two states. Unlike the expectation-maximization (EM) algorithm, our algorithm can avoid trapping into a local optima. Moreover, it is proved to be consistent and a finite sample bound is also given for parameter estimation. In both structural learning and parameter estimation, empirical results were provided to support our theoretical results. In applications to real data, we analyzed the Changchun mayor hotline data, where the underlying structures were detected for Chinese words. We demonstrated that the proposed method is efficient for discovering hierarchical structures and latent information.

[1]  Sham M. Kakade,et al.  A spectral algorithm for learning Hidden Markov Models , 2008, J. Comput. Syst. Sci..

[2]  Vincent Y. F. Tan,et al.  Learning Latent Tree Graphical Models , 2010, J. Mach. Learn. Res..

[3]  Le Song,et al.  Kernel Embeddings of Latent Tree Graphical Models , 2011, NIPS.

[4]  P. Wedin Perturbation theory for pseudo-inverses , 1973 .

[5]  Ilse C. F. Ipsen,et al.  Condition Estimates for Pseudo-Arclength Continuation , 2006, SIAM J. Numer. Anal..

[6]  Joseph T. Chang,et al.  Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. , 1996, Mathematical biosciences.

[7]  Tengfei Liu,et al.  Greedy learning of latent tree models for multidimensional clustering , 2013, Machine Learning.

[8]  J. Lake,et al.  Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Nevin Lianwen Zhang,et al.  Hierarchical latent class models for cluster analysis , 2002, J. Mach. Learn. Res..

[10]  Tandy J. Warnow,et al.  A Few Logs Suffice to Build (almost) All Trees: Part II , 1999, Theor. Comput. Sci..

[11]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[12]  Tao Chen,et al.  Latent Tree Models and Approximate Inference in Bayesian Networks , 2008, AAAI.

[13]  Jianhua Guo,et al.  Varying Naïve Bayes Models With Applications to Classification of Chinese Text Documents , 2014 .

[14]  A. Hasman,et al.  Probabilistic reasoning in intelligent systems: Networks of plausible inference , 1991 .

[15]  C. Matias,et al.  Identifiability of parameters in latent structure models with many observed variables , 2008, 0809.5032.

[16]  Anima Anandkumar,et al.  A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.

[17]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[18]  Elchanan Mossel,et al.  Optimal phylogenetic reconstruction , 2005, STOC '06.

[19]  Tao Chen,et al.  Model-based multidimensional clustering of categorical data , 2012, Artif. Intell..

[20]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[21]  David J. Bartholomew,et al.  Latent Variable Models and Factor Analysis: A Unified Approach , 2011 .

[22]  Tomas Kocka,et al.  Efficient learning of hierarchical latent class models , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.