Robust Model Selection and Nearly-Proper Learning for GMMs

In learning theory, a standard assumption is that the data is generated from a finite mixture model. But what happens when the number of components is not known in advance? The problem of estimating the number of components, also called model selection, is important in its own right but there are essentially no known efficient algorithms with provable guarantees let alone ones that can tolerate adversarial corruptions. In this work, we study the problem of robust model selection for univariate Gaussian mixture models (GMMs). Given $\textsf{poly}(k/\epsilon)$ samples from a distribution that is $\epsilon$-close in TV distance to a GMM with $k$ components, we can construct a GMM with $\widetilde{O}(k)$ components that approximates the distribution to within $\widetilde{O}(\epsilon)$ in $\textsf{poly}(k/\epsilon)$ time. Thus we are able to approximately determine the minimum number of components needed to fit the distribution within a logarithmic factor. Prior to our work, the only known algorithms for learning arbitrary univariate GMMs either output significantly more than $k$ components (e.g. $k/\epsilon^2$ components for kernel density estimates) or run in time exponential in $k$. Moreover, by adapting our techniques we obtain similar results for reconstructing Fourier-sparse signals.

[1]  Avishay Tal,et al.  Junta distance approximation with sub-exponential queries , 2021, Electron. Colloquium Comput. Complex..

[2]  Ankur Moitra,et al.  Learning GMMs with Nearly Optimal Robustness Guarantees , 2021, COLT.

[3]  S. Vempala,et al.  Robustly learning mixtures of k arbitrary Gaussians , 2020, STOC.

[4]  Ankur Moitra,et al.  Settling the robust learnability of mixtures of Gaussians , 2020, STOC.

[5]  Yihong Wu,et al.  Self-regularizing Property of Nonparametric Maximum Likelihood Estimator in Mixture Models , 2020, 2008.08244.

[6]  Daniel M. Kane,et al.  Robust Learning of Mixtures of Gaussians , 2020, SODA.

[7]  Ilias Diakonikolas,et al.  Robustly Learning any Clusterable Mixture of Gaussians , 2020, ArXiv.

[8]  Ameya Velingker,et al.  A universal sampling method for reconstructing signals with simple Fourier transforms , 2018, STOC.

[9]  Xuan Wu,et al.  Improved Algorithms for Properly Learning Mixture of Gaussians , 2018, NCTCS.

[10]  Laurent Demanet,et al.  Conditioning of Partial Nonuniform Fourier Matrices with Clustered Nodes , 2018, SIAM J. Matrix Anal. Appl..

[11]  Yihong Wu,et al.  Optimal estimation of Gaussian mixtures via denoised method of moments , 2018, The Annals of Statistics.

[12]  Pravesh Kothari,et al.  Robust moment estimation and improved clustering via sum of squares , 2018, STOC.

[13]  Eric Price,et al.  Active Regression via Linear-Sample Sparsification , 2017, COLT.

[14]  Jerry Li,et al.  Mixture models, robustness, and sum of squares proofs , 2017, STOC.

[15]  Daniel M. Kane,et al.  List-decodable robust mean estimation and learning mixtures of spherical gaussians , 2017, STOC.

[16]  Jerry Li,et al.  Robust and Proper Learning for Mixtures of Gaussians via Systems of Polynomial Inequalities , 2017, COLT.

[17]  Venkatesan Guruswami,et al.  Robust Fourier and Polynomial Curve Fitting , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[18]  Xue Chen,et al.  Fourier-Sparse Interpolation without a Frequency Gap , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[19]  Yanjun Han,et al.  Minimax rate-optimal estimation of KL divergence between discrete distributions , 2016, 2016 International Symposium on Information Theory and Its Applications (ISITA).

[20]  Dustin G. Mixon,et al.  Clustering subgaussian mixtures by semidefinite programming , 2016, ArXiv.

[21]  Zhao Song,et al.  A Robust Sparse Fourier Transform in the Continuous Setting , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[22]  H. Kasahara,et al.  Testing the Number of Components in Normal Mixture Regression Models , 2015 .

[23]  Volkan Cevher,et al.  What’s the Frequency, Kenneth?: Sublinear Fourier Sampling Off the Grid , 2015, Algorithmica.

[24]  Qingqing Huang,et al.  Super-Resolution Off the Grid , 2015, NIPS.

[25]  Ilias Diakonikolas,et al.  Sample-Optimal Density Estimation in Nearly-Linear Time , 2015, SODA.

[26]  Qingqing Huang,et al.  Learning Mixtures of Gaussians in High Dimensions , 2015, STOC.

[27]  Aditya Bhaskara,et al.  Sparse Solutions to Nonnegative Linear Systems and Applications , 2015, AISTATS.

[28]  Ankur Moitra,et al.  Super-resolution, Extremal Functions and the Condition Number of Vandermonde Matrices , 2014, STOC.

[29]  Yanjun Han,et al.  Minimax Estimation of Functionals of Discrete Distributions , 2014, IEEE Transactions on Information Theory.

[30]  Moritz Hardt,et al.  Tight Bounds for Learning a Mixture of Two Gaussians , 2014, STOC.

[31]  Alon Orlitsky,et al.  Near-Optimal-Sample Estimators for Spherical Gaussian Mixtures , 2014, NIPS.

[32]  Constantinos Daskalakis,et al.  Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians , 2013, COLT.

[33]  Aditya Bhaskara,et al.  Smoothed analysis of tensor decompositions , 2013, STOC.

[34]  Mikhail Belkin,et al.  The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures , 2013, COLT.

[35]  Marco F. Duarte,et al.  Spectral compressive sensing , 2013 .

[36]  Ilias Diakonikolas,et al.  Efficient density estimation via piecewise polynomial approximation , 2013, STOC.

[37]  Gongguo Tang,et al.  Near minimax line spectral estimation , 2013, 2013 47th Annual Conference on Information Sciences and Systems (CISS).

[38]  Tao Huang,et al.  Model Selection for Gaussian Mixture Models , 2013, 1301.3558.

[39]  Parikshit Shah,et al.  Compressed Sensing Off the Grid , 2012, IEEE Transactions on Information Theory.

[40]  Sham M. Kakade,et al.  Learning mixtures of spherical gaussians: moment methods and spectral decompositions , 2012, ITCS '13.

[41]  Pranjal Awasthi,et al.  Improved Spectral-Norm Bounds for Clustering , 2012, APPROX-RANDOM.

[42]  Emmanuel J. Candès,et al.  Towards a Mathematical Theory of Super‐resolution , 2012, ArXiv.

[43]  Wenjing Liao,et al.  Coherence Pattern-Guided Compressive Sensing with Unresolved Grids , 2011, SIAM J. Imaging Sci..

[44]  Gregory Valiant,et al.  Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs , 2011, STOC '11.

[45]  Adam Tauman Kalai,et al.  Efficiently learning mixtures of two Gaussians , 2010, STOC '10.

[46]  Mikhail Belkin,et al.  Polynomial Learning of Distribution Families , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[47]  Ankur Moitra,et al.  Settling the Polynomial Learnability of Mixtures of Gaussians , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[48]  Amit Kumar,et al.  Clustering with Spectral Norm and the k-Means Algorithm , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[49]  Ilias Diakonikolas,et al.  Testing for Concise Representations , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[50]  Sanjeev Arora,et al.  A combinatorial, primal-dual approach to semidefinite programs , 2007, STOC '07.

[51]  Sanjoy Dasgupta,et al.  A Probabilistic Analysis of EM for Mixtures of Separated, Spherical Gaussians , 2007, J. Mach. Learn. Res..

[52]  Ronitt Rubinfeld,et al.  Tolerant property testing and distance approximation , 2006, J. Comput. Syst. Sci..

[53]  Jon Feldman,et al.  PAC Learning Axis-Aligned Mixtures of Gaussians with No Separation Assumption , 2006, COLT.

[54]  Dimitris Achlioptas,et al.  On Spectral Learning of Mixtures of Distributions , 2005, COLT.

[55]  Santosh S. Vempala,et al.  A spectral algorithm for learning mixture models , 2004, J. Comput. Syst. Sci..

[56]  John D. Kalbfleisch,et al.  Testing for a finite mixture model with two components , 2004 .

[57]  Neal O. Jeffries A note on 'Testing the number of components in a normal mixture' , 2003 .

[58]  M. Hazelton Variable kernel density estimation , 2003 .

[59]  D. Rubin,et al.  Testing the number of components in a normal mixture , 2001 .

[60]  Sanjoy Dasgupta,et al.  Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[61]  Thomas Kailath,et al.  ESPRIT-A subspace rotation approach to estimation of parameters of cisoids in noise , 1986, IEEE Trans. Acoust. Speech Signal Process..

[62]  Yoram Bresler,et al.  Exact maximum likelihood parameter estimation of superimposed exponential signals in noise , 1986, IEEE Trans. Acoust. Speech Signal Process..

[63]  Y. Yatracos Rates of Convergence of Minimum Distance Estimators and Kolmogorov's Entropy , 1985 .

[64]  Shai Ben-David,et al.  Nearly tight sample complexity bounds for learning mixtures of Gaussians via sample compression schemes , 2018, NeurIPS.

[65]  Gregory Valiant,et al.  Estimating the unseen: A sublinear-sample canonical estimator of distributions , 2010, Electron. Colloquium Comput. Complex..

[66]  Gregory Valiant,et al.  A CLT and tight lower bounds for estimating entropy , 2010, Electron. Colloquium Comput. Complex..

[67]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[68]  J. Hartigan A failure of likelihood asymptotics for normal mixtures , 1985 .

[69]  P. Sen,et al.  On the asymptotic performance of the log likelihood ratio statistic for the mixture model and related results , 1984 .

[70]  Ralph Otto Schmidt,et al.  A signal subspace approach to multiple emitter location and spectral estimation , 1981 .

[71]  T. J. Rivlin The Chebyshev polynomials , 1974 .

[72]  K. Pearson Contributions to the Mathematical Theory of Evolution , 1894 .