Learning mixtures of distributions

This thesis studies the problem of learning mixtures of distributions, a natural formalization of clustering. A mixture of distributions is a collection of distributions D = {D1, ... DT}, and mixing weights, {w1,..., wT} such that Σiwi = 1. A sample from a mixture is generated by choosing i with probability wi and choosing a sample from distribution Di. Given samples from a mixture of distributions, the problem of learning the mixture is that of finding the parameters of the distributions comprising D and grouping the samples according to source distribution. A common theoretical framework for addressing the problem also assumes that we are given a separation condition, which is a promise that any two distributions in the mixture are sufficiently different. In this thesis, we study three aspects of the problem. First, in Chapter 3, we focus on optimizing the separation condition while learning mixtures of distributions. The most common algorithms in practice are singular value decomposition based algorithms, which work when the separation is Θ pswmin p , where σ is the maximum directional standard deviation of any distribution in the mixture, and wmin is the minimum mixing weight. We show an algorithm which successfully learns mixtures of distributions with a separation condition that depends only logarithmically on the skewed mixing weights. In particular, it succeeds for a separation between the centers that is Θ pTlogLp , where T is the number of distributions, and Λ is polynomial in T and the imbalance in the mixing weights. We require that the distance between the centers be spread across Θ(T log Λ) coordinates. In addition, we show that if every vector in the subspace spanned by the centers has a small projection, of the order of 1TlogL on each coordinate vector, then, our algorithm succeeds for a separation of only O ps*TlogL p , where σ* is the maximum directional standard deviation in the space containing the centers. Our algorithm works for Binary Product Distributions and Axis-Aligned Gaussians. The spreading condition above is implied by the separation condition for binary product distributions, and is necessary for algorithms that rely on linear correlations. Motivated by the application in population genetics, in Chapter 4, we study the sample complexity of learning mixtures of binary product distributions. In this thesis, we take a step towards learning mixtures of binary product distributions with optimal sample complexity by providing an algorithm which learns a mixture of two binary product distributions with uniform mixing weights and low sample complexity. Our algorithm clusters all the samples correctly with high probability, so long as d(μ1, μ 2) the square of the Euclidean distance between the centers of distributions is at least polylogarithmic in s, the number of samples and the following trade-off holds between the separation and the number of samples: sdd2pm1,m 2p≥adnlogslogpnsp for some constant a. Finally, in Chapter 5, we study the problem of learning mixtures of heavy-tailed product distributions. To this end, we provide an embedding from R n to {0, 1}n', which maps random samples from distributions with medians that are far apart to random samples from distributions on {0, 1}n', with centers that are far apart. The main application of our embedding is in designing an algorithm for learning mixtures of heavy-tailed distributions. We provide a polynomial-time algorithm, which learns mixtures of general product distributions, as long as the distribution of each coordinate satisfies two properties: symmetry about the median and ¾-radius upper-bounded by R. The separation required by our algorithm to correctly classify a 1–δ fraction of the samples is that the distance between the medians of any two distributions in the mixture should be O pRTlogL+R TlogTd p , and this distance should be spread across O(T log Λ + T log Td ) coordinates. A second application of our embedding is in designing algorithms for learning mixtures of distributions with finite variance, which work under a separation requirement of O ps*TlogL p and a spreading requirement of O(T log Λ + T log Td ). This algorithm does not require the more stringent spreading condition needed by the algorithm which offers similar guarantees in Chapter 3.

[1]  Gene H. Golub,et al.  Matrix computations , 1983 .

[2]  Sanjoy Dasgupta,et al.  Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[3]  Yishay Mansour,et al.  Estimating a mixture of two product distributions , 1999, COLT '99.

[4]  Sanjoy Dasgupta,et al.  A Two-Round Variant of EM for Gaussian Mixtures , 2000, UAI.

[5]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[6]  Sanjeev Arora,et al.  Learning mixtures of arbitrary gaussians , 2001, STOC '01.

[7]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[8]  M. Ledoux The concentration of measure phenomenon , 2001 .

[9]  Piotr Indyk,et al.  Algorithmic applications of low-distortion geometric embeddings , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[10]  Santosh S. Vempala,et al.  A spectral algorithm for learning mixtures of distributions , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[11]  Dimitris Achlioptas,et al.  On Spectral Learning of Mixtures of Distributions , 2005, COLT.

[12]  Santosh S. Vempala,et al.  The Spectral Method for General Mixture Models , 2005, COLT.

[13]  Jon M. Kleinberg,et al.  On learning mixtures of heavy-tailed distributions , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[14]  Jon Feldman,et al.  Learning mixtures of product distributions over discrete domains , 2005, FOCS.

[15]  Eran Halperin,et al.  A rigorous analysis of population stratification with limited data , 2007, SODA '07.

[16]  Satish Rao,et al.  Beyond Gaussians: Spectral Methods for Learning Mixtures of Heavy-Tailed Distributions , 2008, COLT.

[17]  Devdatt P. Dubhashi,et al.  Concentration of Measure for the Analysis of Randomized Algorithms: Contents , 2009 .