Sample-Efficient Learning of Mixtures

We consider PAC learning of probability distributions (a.k.a. density estimation), where we are given an i.i.d. sample generated from an unknown target distribution, and want to output a distribution that is close to the target in total variation distance. Let $\mathcal F$ be an arbitrary class of probability distributions, and let $\mathcal{F}^k$ denote the class of $k$-mixtures of elements of $\mathcal F$. Assuming the existence of a method for learning $\mathcal F$ with sample complexity $m_{\mathcal{F}}(\epsilon)$, we provide a method for learning $\mathcal F^k$ with sample complexity $O({k\log k \cdot m_{\mathcal F}(\epsilon) }/{\epsilon^{2}})$. Our mixture learning algorithm has the property that, if the $\mathcal F$-learner is proper/agnostic, then the $\mathcal F^k$-learner would be proper/agnostic as well. This general result enables us to improve the best known sample complexity upper bounds for a variety of important mixture classes. First, we show that the class of mixtures of $k$ axis-aligned Gaussians in $\mathbb{R}^d$ is PAC-learnable in the agnostic setting with $\widetilde{O}({kd}/{\epsilon ^ 4})$ samples, which is tight in $k$ and $d$ up to logarithmic factors. Second, we show that the class of mixtures of $k$ Gaussians in $\mathbb{R}^d$ is PAC-learnable in the agnostic setting with sample complexity $\widetilde{O}({kd^2}/{\epsilon ^ 4})$, which improves the previous known bounds of $\widetilde{O}({k^3d^2}/{\epsilon ^ 4})$ and $\widetilde{O}(k^4d^4/\epsilon ^ 2)$ in its dependence on $k$ and $d$. Finally, we show that the class of mixtures of $k$ log-concave distributions over $\mathbb{R}^d$ is PAC-learnable using $\widetilde{O}(d^{(d+5)/2}\epsilon^{-(d+9)/2}k)$ samples.

[1]  Alon Orlitsky,et al.  Near-Optimal-Sample Estimators for Spherical Gaussian Mixtures , 2014, NIPS.

[2]  Rocco A. Servedio,et al.  Explorer Efficient Density Estimation via Piecewise Polynomial Approximation , 2013 .

[3]  Shai Ben-David,et al.  Agnostic Distribution Learning via Compression , 2017, ArXiv.

[4]  Daniel M. Kane,et al.  Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[5]  V. Climenhaga Markov chains and mixing times , 2013 .

[6]  Ankur Moitra,et al.  Settling the Polynomial Learnability of Mixtures of Gaussians , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[7]  Ilias Diakonikolas,et al.  Sample-Optimal Density Estimation in Nearly-Linear Time , 2015, SODA.

[8]  Daniel Stefankovic,et al.  Density Estimation in Linear Time , 2007, COLT.

[9]  Daniel M. Kane,et al.  Statistical Query Lower Bounds for Robust Estimation of High-Dimensional Gaussians and Gaussian Mixtures , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[10]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[11]  Ronitt Rubinfeld,et al.  On the learnability of discrete distributions , 1994, STOC '94.

[12]  Eli Upfal,et al.  Probability and Computing: Randomized Algorithms and Probabilistic Analysis , 2005 .

[13]  Daniel M. Kane,et al.  Learning Multivariate Log-concave Distributions , 2016, COLT.

[14]  Mikhail Belkin,et al.  Polynomial Learning of Distribution Families , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[15]  Marek Karpinski,et al.  Polynomial Bounds for VC Dimension of Sigmoidal and General Pfaffian Neural Networks , 1997, J. Comput. Syst. Sci..

[16]  Sanjoy Dasgupta,et al.  Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[17]  Ilias Diakonikolas,et al.  Learning Structured Distributions , 2016, Handbook of Big Data.

[18]  Jerry Li,et al.  Robust and Proper Learning for Mixtures of Gaussians via Systems of Polynomial Inequalities , 2017, COLT.

[19]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[20]  Jon Feldman,et al.  PAC Learning Axis-Aligned Mixtures of Gaussians with No Separation Assumption , 2006, COLT.

[21]  Nicholas J. A. Harvey,et al.  Settling the Sample Complexity for Learning Mixtures of Gaussians , 2017, 1710.05209.