论文信息 - Settling the Sample Complexity for Learning Mixtures of Gaussians

Settling the Sample Complexity for Learning Mixtures of Gaussians

We prove that $\widetilde{\Theta}(k d^2 / \varepsilon^2)$ samples are necessary and sufficient for learning a mixture of $k$ Gaussians in $\mathbf{R}^d$, up to error $\varepsilon$ in total variation distance. This improves both the known upper bound and lower bound for this problem. For mixtures of axis-aligned Gaussians, we show that $\widetilde{O}(k d / \varepsilon^2)$ samples suffice, matching a known lower bound. Moreover, these results hold in an agnostic learning setting as well. The upper bound is based on a novel technique for distribution learning based on a notion of sample compression. Any class of distributions that allows such a sample compression scheme can also be learned with few samples. Moreover, if a class of distributions has such a compression scheme, then so do the classes of products and mixtures of those distributions. The core of our main result is showing that the class of Gaussians in $\mathbf{R}^d$ has an efficient sample compression.

[1] Milton Abramowitz,et al. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[2] M. Abramowitz,et al. Handbook of Mathematical Functions, with Formulas, Graphs, and Mathematical Tables , 1966 .

[3] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[4] L. Devroye. A Course in Density Estimation , 1987 .

[5] R. Reiss. Approximate Distributions of Order Statistics: With Applications to Nonparametric Statistics , 1989 .

[6] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[7] Ronitt Rubinfeld,et al. On the learnability of discrete distributions , 1994, STOC '94.

[8] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .

[9] P. Massart,et al. Adaptive estimation of a quadratic functional by model selection , 2000 .

[10] Luc Devroye,et al. Combinatorial methods in density estimation , 2001, Springer series in statistics.

[11] Manfred K. Warmuth,et al. Relating Data Compression and Learnability , 2003 .

[12] M. Rudelson,et al. Smallest singular value of random matrices and geometry of random polytopes , 2005 .

[13] Alexandre B. Tsybakov,et al. Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[14] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[15] Adam Tauman Kalai,et al. Disentangling Gaussians , 2012, Commun. ACM.

[16] Roman Vershynin,et al. Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[17] Rocco A. Servedio,et al. Explorer Efficient Density Estimation via Piecewise Polynomial Approximation , 2013 .

[18] Alon Orlitsky,et al. Near-Optimal-Sample Estimators for Spherical Gaussian Mixtures , 2014, NIPS.

[19] Santosh S. Vempala,et al. Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[20] Shay Moran,et al. Sample compression schemes for VC classes , 2015, 2016 Information Theory and Applications Workshop (ITA).

[21] Ilias Diakonikolas,et al. Learning Structured Distributions , 2016, Handbook of Big Data.

[22] Andreas Krause,et al. Training Mixture Models at Scale via Coresets , 2017 .

[23] Daniel M. Kane,et al. Statistical Query Lower Bounds for Robust Estimation of High-Dimensional Gaussians and Gaussian Mixtures , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[24] Shai Ben-David,et al. Sample-Efficient Learning of Mixtures , 2017, AAAI.