Disentangling Gaussians

The Gaussian mixture model (GMM) is one of the oldest and most widely-used statistical models. It is comprised of a weighted combination of heterogeneous Gaussian sources. As a simple one–dimensional example, consider measurements of heights of adults in a certain population, where the distribution of heights can be closely approximated as a mixture of two univariate Gaussians, one for males and one for females [3]. Can one recover the parameters of the Gaussians from unlabeled height measurements alone (with no gender information)? This paper focuses on the case where the mixture consists of a small but unknown number of Gaussians that may overlap—the combined density may even have a single peak, as in the height example, and the dimensionality may be high. Much of the previous work on this problem attempts to learn the parameters through clustering, and consequently needs to make a strong separation assumption on the components in the mixture. The primary contribution of our research is to avoid this assumption by instead basing our learning algorithm upon the algebraic structure of the mixture. Our algorithm succeeds even if the components overlap almost entirely, in which case accurate clustering is no longer possible. We give a simple notion of “condition number” of a GMM which characterizes its complexity up to polynomial factors. Generally speaking, the conclusion is that the statistical complexity and computational complexity of this general problem is in every way polynomial except for the dependence on the number of Gaussians, which is necessarily exponential. Statisticians have long known that from random samples from a GMM it is possible to identify the Gaussians in the limit—one can eventually recover to arbitrary precision each subpopulation’s mean, variance, and proportion, given sufficiently many examples [14]. However, their analysis provides no bounds on convergence rate—it might be exponentially slow even for two Gaussians in one dimension. Moreover, heuristics in widespread use, such as the EM algorithm, suf-

[1]  K. Pearson Contributions to the Mathematical Theory of Evolution , 1894 .

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  G. M. Tallis,et al.  Identifiability of mixtures , 1982, Journal of the Australian Mathematical Society. Series A. Pure Mathematics and Statistics.

[4]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[5]  D. Burmaster,et al.  Bivariate distributions for height and weight of men and women in the United States. , 1992, Risk analysis : an official publication of the Society for Risk Analysis.

[6]  Sanjoy Dasgupta,et al.  Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[7]  Sanjoy Dasgupta,et al.  A Two-Round Variant of EM for Gaussian Mixtures , 2000, UAI.

[8]  Sanjeev Arora,et al.  Learning mixtures of arbitrary gaussians , 2001, STOC '01.

[9]  Santosh S. Vempala,et al.  A spectral algorithm for learning mixture models , 2004, J. Comput. Syst. Sci..

[10]  Santosh S. Vempala,et al.  The Random Projection Method , 2005, DIMACS Series in Discrete Mathematics and Theoretical Computer Science.

[11]  Dimitris Achlioptas,et al.  On Spectral Learning of Mixtures of Distributions , 2005, COLT.

[12]  Jon Feldman,et al.  PAC Learning Axis-Aligned Mixtures of Gaussians with No Separation Assumption , 2006, COLT.

[13]  Santosh S. Vempala,et al.  The Spectral Method for General Mixture Models , 2008, SIAM J. Comput..

[14]  Santosh S. Vempala,et al.  Isotropic PCA and Affine-Invariant Clustering , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[15]  Sham M. Kakade,et al.  A spectral algorithm for learning Hidden Markov Models , 2008, J. Comput. Syst. Sci..