A Method of Moments for Mixture Models and Hidden Markov Models

Mixture models are a fundamental tool in applied statistics and machine learning for treating data taken from multiple subpopulations. The current practice for estimating the parameters of such models relies on local search heuristics (e.g., the EM algorithm) which are prone to failure, and existing consistent methods are unfavorable due to their high computational and sample complexity which typically scale exponentially with the number of mixture components. This work develops an efficient method of moments approach to parameter estimation for a broad class of high-dimensional mixture models with many components, including multi-view mixtures of Gaussians (such as mixtures of axis-aligned Gaussians) and hidden Markov models. The new method leads to rigorous unsupervised learning results for mixture models that were not achieved by previous works; and, because of its simplicity, it offers a viable alternative to EM for practical deployment.

[1]  K. Pearson Contributions to the Mathematical Theory of Evolution , 1894 .

[2]  H. Hotelling The most predictable criterion. , 1935 .

[3]  Marcel Paul Schützenberger,et al.  On the Definition of a Family of Automata , 1961, Inf. Control..

[4]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[5]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[6]  B. Lindsay Moment Matrices: Applications in Mixtures , 1989 .

[7]  V. N. Bogaevski,et al.  Matrix Perturbation Theory , 1991 .

[8]  David Yarowsky,et al.  One Sense Per Discourse , 1992, HLT.

[9]  B. Lindsay,et al.  Multivariate Normal Mixtures: A Fast Consistent Method of Moments , 1993 .

[10]  B. Lindsay Mixture models : theory, geometry, and applications , 1995 .

[11]  Joseph T. Chang,et al.  Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. , 1996, Mathematical biosciences.

[12]  Alan M. Frieze,et al.  Learning linear transformations , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[13]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[14]  Herbert Jaeger,et al.  Observable Operator Models for Discrete Stochastic Time Series , 2000, Neural Computation.

[15]  Sanjeev Arora,et al.  Learning mixtures of arbitrary gaussians , 2001, STOC '01.

[16]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[17]  Santosh S. Vempala,et al.  A spectral algorithm for learning mixtures of distributions , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[18]  Rudolf Ahlswede,et al.  Strong converse for identification via quantum channels , 2000, IEEE Trans. Inf. Theory.

[19]  Sanjoy Dasgupta,et al.  An elementary proof of a theorem of Johnson and Lindenstrauss , 2003, Random Struct. Algorithms.

[20]  Dimitris Achlioptas,et al.  On Spectral Learning of Mixtures of Distributions , 2005, COLT.

[21]  Santosh S. Vempala,et al.  The Spectral Method for General Mixture Models , 2005, COLT.

[22]  Jon Feldman,et al.  Learning mixtures of product distributions over discrete domains , 2005, FOCS.

[23]  Elchanan Mossel,et al.  Learning nonsingular phylogenies and hidden Markov models , 2005, Symposium on the Theory of Computing.

[24]  Jon Feldman,et al.  PAC Learning Mixtures of Axis-Aligned Gaussians with No Separation Assumption , 2006, ArXiv.

[25]  Daniel Boley,et al.  Vandermonde Factorization of a Hankel Matrix ? , 2006 .

[26]  Sanjoy Dasgupta,et al.  A Probabilistic Analysis of EM for Mixtures of Separated, Spherical Gaussians , 2007, J. Mach. Learn. Res..

[27]  Phong Q. Nguyen,et al.  Learning a Parallelepiped: Cryptanalysis of GGH and NTRU Signatures , 2009, Journal of Cryptology.

[28]  Santosh S. Vempala,et al.  Isotropic PCA and Affine-Invariant Clustering , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[29]  Christoph H. Lampert,et al.  Correlational spectral clustering , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Satish Rao,et al.  Learning Mixtures of Product Distributions Using Correlations and Independence , 2008, COLT.

[31]  Sham M. Kakade,et al.  A spectral algorithm for learning Hidden Markov Models , 2008, J. Comput. Syst. Sci..

[32]  Sham M. Kakade,et al.  Multi-view clustering via canonical correlation analysis , 2009, ICML '09.

[33]  Adam Tauman Kalai,et al.  Efficiently learning mixtures of two Gaussians , 2010, STOC '10.

[34]  Ankur Moitra,et al.  Settling the Polynomial Learnability of Mixtures of Gaussians , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[35]  Mikhail Belkin,et al.  Polynomial Learning of Distribution Families , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[36]  Benjamin Recht,et al.  A Simpler Approach to Matrix Completion , 2009, J. Mach. Learn. Res..

[37]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[38]  Nick Gravin,et al.  The Inverse Moment Problem for Convex Polytopes , 2011, Discret. Comput. Geom..