Outlier-Robust Clustering of Non-Spherical Mixtures

We give the first outlier-robust efficient algorithm for clustering a mixture of $k$ statistically separated d-dimensional Gaussians (k-GMMs). Concretely, our algorithm takes input an $\epsilon$-corrupted sample from a $k$-GMM and whp in $d^{\text{poly}(k/\eta)}$ time, outputs an approximate clustering that misclassifies at most $k^{O(k)}(\epsilon+\eta)$ fraction of the points whenever every pair of mixture components are separated by $1-\exp(-\text{poly}(k/\eta)^k)$ in total variation (TV) distance. Such a result was not previously known even for $k=2$. TV separation is the statistically weakest possible notion of separation and captures important special cases such as mixed linear regression and subspace clustering. Our main conceptual contribution is to distill two simple analytic properties - (certifiable) hypercontractivity and anti-concentration - that are necessary and sufficient for mixture models to be (efficiently) clusterable. As a consequence, our results extend to clustering mixtures of arbitrary affine transforms of the uniform distribution on the $d$-dimensional unit sphere. Even the information theoretic clusterability of separated distributions satisfying these two analytic assumptions was not known prior to our work and is likely to be of independent interest. Our algorithms build on the recent sequence of works relying on certifiable anti-concentration first introduced in [KKK'19,RY'20]. Our techniques expand the sum-of-squares toolkit to show robust certifiability of TV-separated Gaussian clusters in data. This involves giving a low-degree sum-of-squares proof of statements that relate parameter (i.e. mean and covariances) distance to total variation distance by relying only on hypercontractivity and anti-concentration.

[1]  Yu Cheng,et al.  High-Dimensional Robust Mean Estimation in Nearly-Linear Time , 2018, SODA.

[2]  Daniel M. Kane,et al.  Statistical Query Lower Bounds for Robust Estimation of High-Dimensional Gaussians and Gaussian Mixtures , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[3]  Prasad Raghavendra,et al.  List Decodable Learning via Sum of Squares , 2019, SODA.

[4]  Inderjit S. Dhillon,et al.  Mixed Linear Regression with Multiple Components , 2016, NIPS.

[5]  Constantine Caramanis,et al.  A Convex Formulation for Mixed Regression: Near Optimal Rates in the Face of Noise , 2013, ArXiv.

[6]  J. Lasserre New Positive Semidefinite Relaxations for Nonconvex Quadratic Programs , 2001 .

[7]  Pravesh Kothari,et al.  Quantum entanglement, sum of squares, and the log rank conjecture , 2017, Electron. Colloquium Comput. Complex..

[8]  Constantine Caramanis,et al.  Alternating Minimization for Mixed Linear Regression , 2013, ICML.

[9]  Eli Ben-Sasson,et al.  Size space tradeoffs for resolution , 2002, STOC '02.

[10]  Adam Tauman Kalai,et al.  Efficiently learning mixtures of two Gaussians , 2010, STOC '10.

[11]  L. Devroye,et al.  The total variation distance between high-dimensional Gaussians , 2018, 1810.08693.

[12]  Daniel M. Kane,et al.  Recent Advances in Algorithmic High-Dimensional Robust Statistics , 2019, ArXiv.

[13]  S. Charles Brubaker,et al.  Robust PCA and clustering in noisy mixtures , 2009, SODA.

[14]  R. Vershynin How Close is the Sample Covariance Matrix to the Actual Covariance Matrix? , 2010, 1004.3484.

[15]  Pooya Hatami,et al.  Semialgebraic Proofs and Efficient Algorithm Design , 2019 .

[16]  Sanjoy Dasgupta,et al.  Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[17]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[18]  M. Laurent Sums of Squares, Moment Matrices and Optimization Over Polynomials , 2009 .

[19]  David Steurer,et al.  Dictionary Learning and Tensor Decomposition via the Sum-of-Squares Method , 2014, STOC.

[20]  Yuanzhi Li,et al.  Learning Mixtures of Linear Regressions with Nearly Optimal Complexity , 2018, COLT.

[21]  K. Pearson Contributions to the Mathematical Theory of Evolution , 1894 .

[22]  Zhao Song,et al.  Learning mixtures of linear regressions in subexponential time via Fourier moments , 2019, STOC.

[23]  Sanjeev Arora,et al.  Learning mixtures of arbitrary gaussians , 2001, STOC '01.

[24]  Santosh S. Vempala,et al.  Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[25]  Pravesh Kothari,et al.  Outlier-robust moment-estimation via sum-of-squares , 2017, ArXiv.

[26]  Martin Grötschel,et al.  The ellipsoid method and its consequences in combinatorial optimization , 1981, Comb..

[27]  Jerry Li,et al.  Being Robust (in High Dimensions) Can Be Practical , 2017, ICML.

[28]  Jerry Li,et al.  Mixture models, robustness, and sum of squares proofs , 2017, STOC.

[29]  Ilias Diakonikolas,et al.  Robustly Learning any Clusterable Mixture of Gaussians , 2020, ArXiv.

[30]  Santosh S. Vempala,et al.  A spectral algorithm for learning mixture models , 2004, J. Comput. Syst. Sci..

[31]  He Jia,et al.  Robustly Clustering a Mixture of Gaussians , 2019, ArXiv.

[32]  Jerry Li,et al.  Sever: A Robust Meta-Algorithm for Stochastic Optimization , 2018, ICML.

[33]  Gregory Valiant,et al.  Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers , 2017, ITCS.

[34]  P. Parrilo Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization , 2000 .

[35]  Gregory Valiant,et al.  Learning from untrusted data , 2016, STOC.

[36]  Adam R. Klivans,et al.  List-Decodable Linear Regression , 2019, NeurIPS.

[37]  Sivaraman Balakrishnan,et al.  Robust estimation via robust gradient estimation , 2018, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[38]  A. Appendix Alternating Minimization for Mixed Linear Regression , 2014 .

[39]  T. Sanders,et al.  Analysis of Boolean Functions , 2012, ArXiv.

[40]  J. Gallier Quadratic Optimization Problems , 2020, Linear Algebra and Optimization with Applications to Machine Learning.

[41]  Rocco A. Servedio,et al.  Bounded Independence Fools Halfspaces , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[42]  Gilda Soromenho,et al.  Fitting mixtures of linear regressions , 2010 .

[43]  Pravesh Kothari,et al.  List-Decodable Subspace Recovery via Sum-of-Squares , 2020, ArXiv.

[44]  Martin J. Wainwright,et al.  Statistical guarantees for the EM algorithm: From population to sample-based analysis , 2014, ArXiv.

[45]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[46]  Ryan O'Donnell,et al.  Hypercontractive inequalities via SOS, and the Frankl-Rödl graph , 2012, SODA.

[47]  Pravesh Kothari,et al.  Semialgebraic Proofs and Efficient Algorithm Design , 2019, Electron. Colloquium Comput. Complex..

[48]  Mikhail Belkin,et al.  Polynomial Learning of Distribution Families , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[49]  Ankur Moitra,et al.  Settling the Polynomial Learnability of Mixtures of Gaussians , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[50]  Hans-Peter Kriegel,et al.  Subspace clustering , 2012, WIREs Data Mining Knowl. Discov..

[51]  Yurii Nesterov,et al.  Squared Functional Systems and Optimization Problems , 2000 .

[52]  Daniel M. Kane,et al.  List-decodable robust mean estimation and learning mixtures of spherical gaussians , 2017, STOC.

[53]  Jerry Li,et al.  Robustly Learning a Gaussian: Getting Optimal Error, Efficiently , 2017, SODA.

[54]  Daniel M. Kane,et al.  Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[55]  Santosh S. Vempala,et al.  Isotropic PCA and Affine-Invariant Clustering , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[56]  Pravesh Kothari,et al.  Better Agnostic Clustering Via Relaxed Tensor Norms , 2017, ArXiv.

[57]  Anima Anandkumar,et al.  Provable Tensor Methods for Learning Mixtures of Generalized Linear Models , 2014, AISTATS.

[58]  David P. Woodruff,et al.  Faster Algorithms for High-Dimensional Robust Covariance Estimation , 2019, COLT.

[59]  R. D. Veaux,et al.  Mixtures of linear regressions , 1989 .