论文信息 - Tensor decompositions for learning latent variable models

Tensor decompositions for learning latent variable models

This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models--including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation--which exploits a certain tensor structure in their low-order observable moments (typically, of second- and third-order). Specifically, parameter estimation is reduced to the problem of extracting a certain (orthogonal) decomposition of a symmetric tensor derived from the moments; this decomposition can be viewed as a natural generalization of the singular value decomposition for matrices. Although tensor decompositions are generally intractable to compute, the decomposition of these specially structured tensors can be efficiently obtained by a variety of approaches, including power iterations and maximization approaches (similar to the case of matrices). A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin's perturbation theorem for the singular vectors of matrices. This implies a robust and computationally tractable estimation approach for several popular latent variable models.

[1] K. Pearson. Contributions to the Mathematical Theory of Evolution , 1894 .

[2] F. L. Hitchcock. The Expression of a Tensor or a Polyadic as a Sum of Products , 1927 .

[3] F. L. Hitchcock. Multiple Invariants and Generalized Rank of a P‐Way Matrix or Tensor , 1928 .

[4] R. Cattell. “Parallel proportional profiles” and other principles for determining the choice of factors by rotation , 1944 .

[5] Marcel Paul Schützenberger,et al. On the Definition of a Family of Automata , 1961, Inf. Control..

[6] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[7] Richard A. Harshman,et al. Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[8] P. Wedin. Perturbation bounds in connection with singular value decomposition , 1972 .

[9] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[10] J. Kruskal. Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics , 1977 .

[11] R. Redner,et al. Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[12] L. L. Cam,et al. Asymptotic methods in statistical theory , 1986 .

[13] L. L. Cam,et al. Asymptotic Methods In Statistical Decision Theory , 1986 .

[14] P. McCullagh. Tensor Methods in Statistics , 1987 .

[15] Jean-Francois Cardoso,et al. Super-symmetric decomposition of the fourth-order cumulant tensor. Blind identification of more sources than sensors , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[16] A. Bunse-Gerstner,et al. Numerical Methods for Simultaneous Diagonalization , 1993, SIAM J. Matrix Anal. Appl..

[17] J. Cardoso,et al. Blind beamforming for non-gaussian signals , 1993 .

[18] S. Leurgans,et al. A Decomposition for Three-Way Arrays , 1993, SIAM J. Matrix Anal. Appl..

[19] Jean-Francois Cardoso,et al. Perturbation of joint diagonalizers , 1994 .

[20] Pierre Comon,et al. Independent component analysis, A new concept? , 1994, Signal Process..

[21] Nathalie Delfosse,et al. Adaptive blind separation of independent sources: A deflation approach , 1995, Signal Process..

[22] B. Moor,et al. Subspace identification for linear systems , 1996 .

[23] Pierre Comon,et al. Independent component analysis, a survey of some algebraic methods , 1996, 1996 IEEE International Symposium on Circuits and Systems. Circuits and Systems Connecting the World. ISCAS 96.

[24] Joseph T. Chang,et al. Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. , 1996, Mathematical biosciences.

[25] Alan M. Frieze,et al. Learning linear transformations , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[26] Robert M. Corless,et al. A reordered Schur factorization method for zero-dimensional polynomial systems with multiple roots , 1997, ISSAC.

[27] Stephen J. Wright,et al. Numerical Optimization , 2018, Fundamental Statistical Inference.

[28] Aapo Hyvärinen,et al. Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[29] Sanjoy Dasgupta,et al. Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[30] Erkki Oja,et al. Independent component analysis: algorithms and applications , 2000, Neural Networks.

[31] Herbert Jaeger,et al. Observable Operator Models for Discrete Stochastic Time Series , 2000, Neural Computation.

[32] Joos Vandewalle,et al. On the Best Rank-1 and Rank-(R1 , R2, ... , RN) Approximation of Higher-Order Tensors , 2000, SIAM J. Matrix Anal. Appl..

[33] Sanjeev Arora,et al. Learning mixtures of arbitrary gaussians , 2001, STOC '01.

[34] Gene H. Golub,et al. Rank-One Approximation to High Order Tensors , 2001, SIAM J. Matrix Anal. Appl..

[35] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.

[36] Santosh S. Vempala,et al. A spectral algorithm for learning mixtures of distributions , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[37] Phillip A. Regalia,et al. On the Best Rank-1 Approximation of Higher-Order Supersymmetric Tensors , 2001, SIAM J. Matrix Anal. Appl..

[38] Phillip A. Regalia,et al. Monotonic convergence of fixed-point algorithms for ICA , 2003, IEEE Trans. Neural Networks.

[39] Santosh S. Vempala,et al. A spectral algorithm for learning mixture models , 2004, J. Comput. Syst. Sci..

[40] L. Lathauwer,et al. On the Best Rank-1 and Rank-( , 2004 .

[41] Andreas Ziehe,et al. A Fast Algorithm for Joint Diagonalization with Non-orthogonal Transformations and its Application to Blind Source Separation , 2004, J. Mach. Learn. Res..

[42] Sanjeev Arora,et al. LEARNING MIXTURES OF SEPARATED NONSPHERICAL GAUSSIANS , 2005, math/0503457.

[43] Dimitris Achlioptas,et al. On Spectral Learning of Mixtures of Distributions , 2005, COLT.

[44] Elchanan Mossel,et al. Learning nonsingular phylogenies and hidden Markov models , 2005, STOC '05.

[45] Lek-Heng Lim,et al. Singular values and eigenvalues of tensors: a variational approach , 2005, 1st IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, 2005..

[46] M. Drton,et al. Algebraic factor analysis: tetrads, pentads and beyond , 2005, math/0509390.

[47] Liqun Qi,et al. Eigenvalues of a real supersymmetric tensor , 2005, J. Symb. Comput..

[48] L. Pachter,et al. Algebraic Statistics for Computational Biology: Preface , 2005 .

[49] Sébastien Roch,et al. A short proof that phylogenetic tree reconstruction by maximum likelihood is hard , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[50] Lieven De Lathauwer,et al. Fourth-Order Cumulant-Based Blind Identification of Underdetermined Mixtures , 2007, IEEE Transactions on Signal Processing.

[51] Sanjoy Dasgupta,et al. A Probabilistic Analysis of EM for Mixtures of Separated, Spherical Gaussians , 2007, J. Mach. Learn. Res..

[52] Phong Q. Nguyen,et al. Learning a Parallelepiped: Cryptanalysis of GGH and NTRU Signatures , 2009, Journal of Cryptology.

[53] Santosh S. Vempala,et al. The Spectral Method for General Mixture Models , 2008, SIAM J. Comput..

[54] Santosh S. Vempala,et al. Isotropic PCA and Affine-Invariant Clustering , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[55] Gene H. Golub,et al. Symmetric Tensors and Symmetric Tensor Rank , 2008, SIAM J. Matrix Anal. Appl..

[56] Tim Austin. On exchangeable random variables and the statistics of large graphs and hypergraphs , 2008, 0801.1698.

[57] Satish Rao,et al. Learning Mixtures of Product Distributions Using Correlations and Independence , 2008, COLT.

[58] Sham M. Kakade,et al. A spectral algorithm for learning Hidden Markov Models , 2008, J. Comput. Syst. Sci..

[59] Shang-Hua Teng,et al. Smoothed analysis: an attempt to explain the behavior of algorithms in practice , 2009, CACM.

[60] Tamara G. Kolda,et al. Tensor Decompositions and Applications , 2009, SIAM Rev..

[61] Pierre Comon,et al. Subtracting a best rank-1 approximation may increase tensor rank , 2009, 2009 17th European Signal Processing Conference.

[62] Alper T. Erdogan,et al. On the Convergence of ICA Algorithms With Symmetric Orthogonalization , 2008, IEEE Transactions on Signal Processing.

[63] C. Matias,et al. Identifiability of parameters in latent structure models with many observed variables , 2008, 0809.5032.

[64] Byron Boots,et al. Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..

[65] Byron Boots,et al. Reduced-Rank Hidden Markov Models , 2009, AISTATS.

[66] Pierre Comon,et al. Handbook of Blind Source Separation: Independent Component Analysis and Applications , 2010 .

[67] Adam Tauman Kalai,et al. Efficiently learning mixtures of two Gaussians , 2010, STOC '10.

[68] Ankur Moitra,et al. Settling the Polynomial Learnability of Mixtures of Gaussians , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[69] Mikhail Belkin,et al. Polynomial Learning of Distribution Families , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[70] B. Sturmfels,et al. Binary Cumulant Varieties , 2011, 1103.0153.

[71] Le Song,et al. A Spectral Algorithm for Latent Tree Graphical Models , 2011, ICML.

[72] Robert H. Halstead,et al. Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[73] Nathan Halko,et al. Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[74] Raphaël Bailly. Quadratic Weighted Automata: Spectral Algorithm and Likelihood Maximization , 2011, ACML 2011.

[75] Tamara G. Kolda,et al. Shifted Power Method for Computing Tensor Eigenpairs , 2010, SIAM J. Matrix Anal. Appl..

[76] Byron Boots,et al. An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems , 2011, AAAI.

[77] Ariadna Quattoni,et al. Spectral Learning for Non-Deterministic Dependency Parsing , 2012, EACL.

[78] Mehryar Mohri,et al. Spectral Learning of General Weighted Automata via Constrained Matrix Completion , 2012, NIPS.

[79] Karl Stratos,et al. Spectral Learning of Latent-Variable PCFGs , 2012, ACL.

[80] Michael Collins,et al. Spectral Dependency Parsing with Latent Variables , 2012, EMNLP-CoNLL.

[81] Seungjin Choi,et al. Independent Component Analysis , 2009, Handbook of Natural Computing.

[82] Sanjeev Arora,et al. Learning Topic Models -- Going beyond SVD , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[83] Sham M. Kakade,et al. Identifiability and Unmixing of Latent Parse Trees , 2012, NIPS.

[84] Ariadna Quattoni,et al. Local Loss Optimization in Operator Models: A New Insight into Spectral Learning , 2012, ICML.

[85] Dean P. Foster,et al. Spectral dimensionality reduction for HMMs , 2012, ArXiv.

[86] Anima Anandkumar,et al. A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.

[87] Anima Anandkumar,et al. Learning Mixtures of Tree Graphical Models , 2012, NIPS.

[88] B. Sturmfels,et al. The number of eigenvalues of a tensor , 2010, 1004.4953.

[89] Sham M. Kakade,et al. Learning mixtures of spherical gaussians: moment methods and spectral decompositions , 2012, ITCS '13.

[90] Ryan P. Adams,et al. Contrastive Learning Using Spectral Methods , 2013, NIPS.

[91] Dean P. Foster,et al. Using Regression for Spectral Estimation of HMMs , 2013, SLSP.

[92] Christopher J. Hillar,et al. Most Tensor Problems Are NP-Hard , 2009, JACM.

[93] Aditya Bhaskara,et al. Smoothed analysis of tensor decompositions , 2013, STOC.

[94] Mikhail Belkin,et al. The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures , 2013, COLT.

[95] Anima Anandkumar,et al. A Spectral Algorithm for Latent Dirichlet Allocation , 2012, Algorithmica.

[96] Sanjeev Arora,et al. Provable ICA with Unknown Gaussian Noise, and Implications for Gaussian Mixtures and Autoencoders , 2012, Algorithmica.