论文信息 - Tensor Decompositions for Learning Latent Variable Models (A Survey for ALT)

Tensor Decompositions for Learning Latent Variable Models (A Survey for ALT)

This note is a short version of that in [1]. It is intended as a survey for the 2015 Algorithmic Learning Theory ALT conference. This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models--including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation--which exploits a certain tensor structure in their low-order observable moments typically, of second- and third-order. Specifically, parameter estimation is reduced to the problem of extracting a certain orthogonal decomposition of a symmetric tensor derived from the moments; this decomposition can be viewed as a natural generalization of the singular value decomposition for matrices. Although tensor decompositions are generally intractable to compute, the decomposition of these specially structured tensors can be efficiently obtained by a variety of approaches, including power iterations and maximization approaches similar to the case of matrices. A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin's perturbation theorem for the singular vectors of matrices. This implies a robust and computationally tractable estimation approach for several popular latent variable models.

[1] K. Pearson. Contributions to the Mathematical Theory of Evolution , 1894 .

[2] R. Cattell. “Parallel proportional profiles” and other principles for determining the choice of factors by rotation , 1944 .

[3] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[4] P. Wedin. Perturbation bounds in connection with singular value decomposition , 1972 .

[5] L. L. Cam,et al. Asymptotic methods in statistical theory , 1986 .

[6] L. L. Cam,et al. Asymptotic Methods In Statistical Decision Theory , 1986 .

[7] P. McCullagh. Tensor Methods in Statistics , 1987 .

[8] Jean-Francois Cardoso,et al. Super-symmetric decomposition of the fourth-order cumulant tensor. Blind identification of more sources than sensors , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[9] Pierre Comon,et al. Independent component analysis, A new concept? , 1994, Signal Process..

[10] Nathalie Delfosse,et al. Adaptive blind separation of independent sources: A deflation approach , 1995, Signal Process..

[11] Pierre Comon,et al. Independent component analysis, a survey of some algebraic methods , 1996, 1996 IEEE International Symposium on Circuits and Systems. Circuits and Systems Connecting the World. ISCAS 96.

[12] Joseph T. Chang,et al. Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. , 1996, Mathematical biosciences.

[13] Alan M. Frieze,et al. Learning linear transformations , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[14] Erkki Oja,et al. Independent component analysis: algorithms and applications , 2000, Neural Networks.

[15] Joos Vandewalle,et al. On the Best Rank-1 and Rank-(R1 , R2, ... , RN) Approximation of Higher-Order Tensors , 2000, SIAM J. Matrix Anal. Appl..

[16] Gene H. Golub,et al. Rank-One Approximation to High Order Tensors , 2001, SIAM J. Matrix Anal. Appl..

[17] Elchanan Mossel,et al. Learning nonsingular phylogenies and hidden Markov models , 2005, STOC '05.

[18] Lek-Heng Lim,et al. Singular values and eigenvalues of tensors: a variational approach , 2005, 1st IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, 2005..

[19] Liqun Qi,et al. Eigenvalues of a real supersymmetric tensor , 2005, J. Symb. Comput..

[20] Gene H. Golub,et al. Symmetric Tensors and Symmetric Tensor Rank , 2008, SIAM J. Matrix Anal. Appl..

[21] Tim Austin. On exchangeable random variables and the statistics of large graphs and hypergraphs , 2008, 0801.1698.

[22] Sham M. Kakade,et al. A spectral algorithm for learning Hidden Markov Models , 2008, J. Comput. Syst. Sci..

[23] Pierre Comon,et al. Subtracting a best rank-1 approximation may increase tensor rank , 2009, 2009 17th European Signal Processing Conference.

[24] Pierre Comon,et al. Handbook of Blind Source Separation: Independent Component Analysis and Applications , 2010 .

[25] Ankur Moitra,et al. Settling the Polynomial Learnability of Mixtures of Gaussians , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[26] Tamara G. Kolda,et al. Shifted Power Method for Computing Tensor Eigenpairs , 2010, SIAM J. Matrix Anal. Appl..

[27] Sham M. Kakade,et al. Identifiability and Unmixing of Latent Parse Trees , 2012, NIPS.

[28] Anima Anandkumar,et al. A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.

[29] Anima Anandkumar,et al. Learning Mixtures of Tree Graphical Models , 2012, NIPS.

[30] Sham M. Kakade,et al. Learning mixtures of spherical gaussians: moment methods and spectral decompositions , 2012, ITCS '13.

[31] Anima Anandkumar,et al. Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[32] Anima Anandkumar,et al. A Spectral Algorithm for Latent Dirichlet Allocation , 2012, Algorithmica.