Tensor Decompositions for Learning Latent Variable Models (A Survey for ALT)

This note is a short version of that in [1]. It is intended as a survey for the 2015 Algorithmic Learning Theory ALT conference. This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models--including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation--which exploits a certain tensor structure in their low-order observable moments typically, of second- and third-order. Specifically, parameter estimation is reduced to the problem of extracting a certain orthogonal decomposition of a symmetric tensor derived from the moments; this decomposition can be viewed as a natural generalization of the singular value decomposition for matrices. Although tensor decompositions are generally intractable to compute, the decomposition of these specially structured tensors can be efficiently obtained by a variety of approaches, including power iterations and maximization approaches similar to the case of matrices. A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin's perturbation theorem for the singular vectors of matrices. This implies a robust and computationally tractable estimation approach for several popular latent variable models.

[1]  K. Pearson Contributions to the Mathematical Theory of Evolution , 1894 .

[2]  R. Cattell “Parallel proportional profiles” and other principles for determining the choice of factors by rotation , 1944 .

[3]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[4]  P. Wedin Perturbation bounds in connection with singular value decomposition , 1972 .

[5]  L. L. Cam,et al.  Asymptotic methods in statistical theory , 1986 .

[6]  L. L. Cam,et al.  Asymptotic Methods In Statistical Decision Theory , 1986 .

[7]  P. McCullagh Tensor Methods in Statistics , 1987 .

[8]  Jean-Francois Cardoso,et al.  Super-symmetric decomposition of the fourth-order cumulant tensor. Blind identification of more sources than sensors , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[9]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[10]  Nathalie Delfosse,et al.  Adaptive blind separation of independent sources: A deflation approach , 1995, Signal Process..

[11]  Pierre Comon,et al.  Independent component analysis, a survey of some algebraic methods , 1996, 1996 IEEE International Symposium on Circuits and Systems. Circuits and Systems Connecting the World. ISCAS 96.

[12]  Joseph T. Chang,et al.  Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. , 1996, Mathematical biosciences.

[13]  Alan M. Frieze,et al.  Learning linear transformations , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[14]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[15]  Joos Vandewalle,et al.  On the Best Rank-1 and Rank-(R1 , R2, ... , RN) Approximation of Higher-Order Tensors , 2000, SIAM J. Matrix Anal. Appl..

[16]  Gene H. Golub,et al.  Rank-One Approximation to High Order Tensors , 2001, SIAM J. Matrix Anal. Appl..

[17]  Elchanan Mossel,et al.  Learning nonsingular phylogenies and hidden Markov models , 2005, STOC '05.

[18]  Lek-Heng Lim,et al.  Singular values and eigenvalues of tensors: a variational approach , 2005, 1st IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, 2005..

[19]  Liqun Qi,et al.  Eigenvalues of a real supersymmetric tensor , 2005, J. Symb. Comput..

[20]  Gene H. Golub,et al.  Symmetric Tensors and Symmetric Tensor Rank , 2008, SIAM J. Matrix Anal. Appl..

[21]  Tim Austin On exchangeable random variables and the statistics of large graphs and hypergraphs , 2008, 0801.1698.

[22]  Sham M. Kakade,et al.  A spectral algorithm for learning Hidden Markov Models , 2008, J. Comput. Syst. Sci..

[23]  Pierre Comon,et al.  Subtracting a best rank-1 approximation may increase tensor rank , 2009, 2009 17th European Signal Processing Conference.

[24]  Pierre Comon,et al.  Handbook of Blind Source Separation: Independent Component Analysis and Applications , 2010 .

[25]  Ankur Moitra,et al.  Settling the Polynomial Learnability of Mixtures of Gaussians , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[26]  Tamara G. Kolda,et al.  Shifted Power Method for Computing Tensor Eigenpairs , 2010, SIAM J. Matrix Anal. Appl..

[27]  Sham M. Kakade,et al.  Identifiability and Unmixing of Latent Parse Trees , 2012, NIPS.

[28]  Anima Anandkumar,et al.  A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.

[29]  Anima Anandkumar,et al.  Learning Mixtures of Tree Graphical Models , 2012, NIPS.

[30]  Sham M. Kakade,et al.  Learning mixtures of spherical gaussians: moment methods and spectral decompositions , 2012, ITCS '13.

[31]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[32]  Anima Anandkumar,et al.  A Spectral Algorithm for Latent Dirichlet Allocation , 2012, Algorithmica.