Statistical Query Lower Bounds for Robust Estimation of High-Dimensional Gaussians and Gaussian Mixtures

We describe a general technique that yields the first Statistical Query lower bounds} fora range of fundamental high-dimensional learning problems involving Gaussian distributions. Our main results are for the problems of (1) learning Gaussian mixture models (GMMs), and (2) robust (agnostic) learning of a single unknown Gaussian distribution. For each of these problems, we show a super-polynomial gap} between the (information-theoretic)sample complexity and the computational complexity of any} Statistical Query algorithm for the problem. Statistical Query (SQ) algorithms are a class of algorithms that are only allowed to query expectations of functions of the distribution rather than directly access samples. This class of algorithms is quite broad: a wide range of known algorithmic techniques in machine learning are known to be implementable using SQs.Moreover, for the unsupervised learning problems studied in this paper, all known algorithms with non-trivial performance guarantees are SQ or are easily implementable using SQs. Our SQ lower bound for Problem (1)is qualitatively matched by known learning algorithms for GMMs. At a conceptual level, this result implies that – as far as SQ algorithms are concerned – the computational complexity of learning GMMs is inherently exponential in the dimension of the latent space} – even though there is no such information-theoretic barrier. Our lower bound for Problem (2) implies that the accuracy of the robust learning algorithm in \cite{DiakonikolasKKLMS16} is essentially best possible among all polynomial-time SQ algorithms. On the positive side, we also give a new (SQ) learning algorithm for Problem (2) achievingthe information-theoretically optimal accuracy, up to a constant factor, whose running time essentially matches our lower bound. Our algorithm relies on a filtering technique generalizing \cite{DiakonikolasKKLMS16} that removes outliers based on higher-order tensors.Our SQ lower bounds are attained via a unified moment-matching technique that is useful in other contexts and may be of broader interest. Our technique yields nearly-tight lower bounds for a number of related unsupervised estimation problems. Specifically, for the problems of (3) robust covariance estimation in spectral norm, and (4) robust sparse mean estimation, we establish a quadratic statistical–computational tradeoff} for SQ algorithms, matching known upper bounds. Finally, our technique can be used to obtain tight sample complexitylower bounds for high-dimensional testing} problems. Specifically, for the classical problem of robustly testing} an unknown mean (known covariance) Gaussian, our technique implies an information-theoretic sample lower bound that scales linearly} in the dimension. Our sample lower bound matches the sample complexity of the corresponding robust learning} problem and separates the sample complexity of robust testing from standard (non-robust) testing. This separation is surprising because such a gap does not exist for the corresponding learning problem.

[1]  H. Hotelling The Generalization of Student’s Ratio , 1931 .

[2]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[3]  J. Kalbfleisch Statistical Inference Under Order Restrictions , 1975 .

[4]  J. Tukey Mathematics and the Picturing of Data , 1975 .

[5]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[6]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[7]  D. Ruppert Robust Statistics: The Approach Based on Influence Functions , 1987 .

[8]  L. Devroye,et al.  Nonparametric density estimation : the L[1] view , 1987 .

[9]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[10]  D. Donoho,et al.  Breakdown Properties of Location Estimates Based on Halfspace Depth and Projected Outlyingness , 1992 .

[11]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[12]  Yishay Mansour,et al.  Weakly learning DNF and characterizing statistical query learning using Fourier analysis , 1994, STOC '94.

[13]  Ronitt Rubinfeld,et al.  On the learnability of discrete distributions , 1994, STOC '94.

[14]  R. Wilcox Introduction to Robust Estimation and Hypothesis Testing , 1997 .

[15]  Sanjoy Dasgupta,et al.  Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[16]  Z. Bai,et al.  EFFECT OF HIGH DIMENSION: BY AN EXAMPLE OF A TWO SAMPLE PROBLEM , 1999 .

[17]  Yishay Mansour,et al.  Estimating a mixture of two product distributions , 1999, COLT '99.

[18]  Ronitt Rubinfeld,et al.  Testing that distributions are close , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[19]  Sanjeev Arora,et al.  Learning mixtures of arbitrary gaussians , 2001, STOC '01.

[20]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[21]  Paul W. Goldberg,et al.  Evolutionary Trees Can be Learned in Polynomial Time in the Two-State General Markov Model , 2001, SIAM J. Comput..

[22]  Santosh S. Vempala,et al.  A spectral algorithm for learning mixtures of distributions , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[23]  S. Sheather Density Estimation , 2004 .

[24]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[25]  Dimitris Achlioptas,et al.  On Spectral Learning of Mixtures of Distributions , 2005, COLT.

[26]  Julia Kastner,et al.  Introduction to Robust Estimation and Hypothesis Testing , 2005 .

[27]  Elchanan Mossel,et al.  Learning nonsingular phylogenies and hidden Markov models , 2005, STOC '05.

[28]  Stephen E. Fienberg,et al.  Testing Statistical Hypotheses , 2005 .

[29]  Rocco A. Servedio,et al.  Agnostically learning halfspaces , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[30]  Jon Feldman,et al.  Learning mixtures of product distributions over discrete domains , 2005, FOCS.

[31]  Rene F. Swarttouw,et al.  Orthogonal polynomials , 2020, NIST Handbook of Mathematical Functions.

[32]  J. Feldman,et al.  PAC Learning Mixtures of Gaussians with No Separation Assumption , 2006 .

[33]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[34]  Vitaly Feldman,et al.  New Results for Learning Noisy Parities and Halfspaces , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[35]  Alexander A. Sherstov,et al.  Cryptographic Hardness for Learning Intersections of Halfspaces , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[36]  Seshadhri Comandur,et al.  Testing Expansion in Bounded Degree Graphs , 2007, Electron. Colloquium Comput. Complex..

[37]  Santosh S. Vempala,et al.  The Spectral Method for General Mixture Models , 2008, SIAM J. Comput..

[38]  Santosh S. Vempala,et al.  Isotropic PCA and Affine-Invariant Clustering , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[39]  M. Srivastava,et al.  A test for the mean vector with fewer observations than the dimension , 2008 .

[40]  Vitaly Feldman,et al.  Statistical Query Learning , 2008, Encyclopedia of Algorithms.

[41]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[42]  R. Vershynin How Close is the Sample Covariance Matrix to the Actual Covariance Matrix? , 2010, 1004.3484.

[43]  Adam Tauman Kalai,et al.  Efficiently learning mixtures of two Gaussians , 2010, STOC '10.

[44]  Song-xi Chen,et al.  A two-sample test for high-dimensional data with applications to gene-set testing , 2010, 1002.4547.

[45]  Ankur Moitra,et al.  Settling the Polynomial Learnability of Mixtures of Gaussians , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[46]  Mikhail Belkin,et al.  Polynomial Learning of Distribution Families , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[47]  P. Rigollet,et al.  Optimal detection of sparse principal components in high dimension , 2012, 1202.5070.

[48]  T. Sanders,et al.  Analysis of Boolean Functions , 2012, ArXiv.

[49]  T. Cai,et al.  Sparse PCA: Optimal rates and adaptive estimation , 2012, 1211.1309.

[50]  Jianqing Fan,et al.  Distributions of angles in random packing on spheres , 2013, J. Mach. Learn. Res..

[51]  Philippe Rigollet,et al.  Complexity Theoretic Lower Bounds for Sparse Principal Component Detection , 2013, COLT.

[52]  Sham M. Kakade,et al.  Learning mixtures of spherical gaussians: moment methods and spectral decompositions , 2012, ITCS '13.

[53]  Ryan O'Donnell,et al.  Learning Sums of Independent Integer Random Variables , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[54]  Rocco A. Servedio,et al.  Learning mixtures of structured distributions over discrete domains , 2012, SODA.

[55]  Rocco A. Servedio,et al.  Learning k-Modal Distributions via Testing , 2012, Theory Comput..

[56]  Martin J. Wainwright,et al.  Lower bounds on the performance of polynomial-time algorithms for sparse linear regression , 2014, COLT.

[57]  Pravesh Kothari,et al.  Embedding Hard Learning Problems into Gaussian Space , 2014, Electron. Colloquium Comput. Complex..

[58]  Ryan O'Donnell,et al.  Analysis of Boolean Functions , 2014, ArXiv.

[59]  Constantinos Daskalakis,et al.  Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians , 2013, COLT.

[60]  Rocco A. Servedio,et al.  Explorer Efficient Density Estimation via Piecewise Polynomial Approximation , 2013 .

[61]  Aditya Bhaskara,et al.  Smoothed analysis of tensor decompositions , 2013, STOC.

[62]  Alon Orlitsky,et al.  Near-Optimal-Sample Estimators for Spherical Gaussian Mixtures , 2014, NIPS.

[63]  Nathan Linial,et al.  From average case complexity to improper learning complexity , 2013, STOC.

[64]  Santosh S. Vempala,et al.  University of Birmingham On the Complexity of Random Satisfiability Problems with Planted Solutions , 2018 .

[65]  Mikhail Belkin,et al.  The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures , 2013, COLT.

[66]  Rocco A. Servedio,et al.  Near-Optimal Density Estimation in Near-Linear Time Using Variable-Width Histograms , 2014, NIPS.

[67]  Quentin Berthet,et al.  Statistical and computational trade-offs in estimation of sparse principal components , 2014, 1408.5369.

[68]  Santosh S. Vempala,et al.  Statistical Query Algorithms for Stochastic Convex Optimization , 2015, ArXiv.

[69]  Qingqing Huang,et al.  Learning Mixtures of Gaussians in High Dimensions , 2015, STOC.

[70]  Rocco A. Servedio,et al.  Learning from satisfying assignments , 2015, SODA.

[71]  Ludwig Schmidt,et al.  A Nearly Optimal and Agnostic Algorithm for Properly Learning a Mixture of k Gaussians, for any Constant k , 2015, ArXiv.

[72]  Avi Wigderson,et al.  Sum-of-Squares Lower Bounds for Sparse PCA , 2015, NIPS.

[73]  Chao Gao,et al.  Robust Covariance Matrix Estimation via Matrix Depth , 2015 .

[74]  Rocco A. Servedio,et al.  Learning Poisson Binomial Distributions , 2011, STOC '12.

[75]  T. Cai,et al.  Optimal estimation and rank detection for sparse spiked covariance matrices , 2013, Probability theory and related fields.

[76]  Moritz Hardt,et al.  Tight Bounds for Learning a Mixture of Two Gaussians , 2014, STOC.

[77]  Daniel M. Kane,et al.  Optimal Learning via the Fourier Transform for Sums of Independent Integer Random Variables , 2015, COLT.

[78]  Santosh S. Vempala,et al.  Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[79]  Ming-Yang Kao,et al.  Encyclopedia of Algorithms , 2016, Springer New York.

[80]  Amit Daniely,et al.  Complexity theoretic limitations on learning halfspaces , 2015, STOC.

[81]  Santosh S. Vempala,et al.  Beyond Spectral: Tight Bounds for Planted Gaussians , 2016, ArXiv.

[82]  Anindya De,et al.  A size-free CLT for poisson multinomials and its applications , 2015, STOC.

[83]  Daniel M. Kane,et al.  Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[84]  Daniel M. Kane,et al.  The fourier transform of poisson multinomial distributions and its algorithmic applications , 2015, STOC.

[85]  Ilias Diakonikolas,et al.  Learning Structured Distributions , 2016, Handbook of Big Data.

[86]  Jerry Li,et al.  Being Robust (in High Dimensions) Can Be Practical , 2017, ICML.

[87]  Santosh S. Vempala,et al.  Statistical Algorithms and a Lower Bound for Detecting Planted Cliques , 2012, J. ACM.

[88]  Jerry Li,et al.  Robust Sparse Estimation Tasks in High Dimensions , 2017, ArXiv.

[89]  Vitaly Feldman,et al.  A General Characterization of the Statistical Query Complexity , 2016, COLT.

[90]  Sivaraman Balakrishnan,et al.  Computationally Efficient Robust Estimation of Sparse Functionals , 2017, ArXiv.

[91]  Ilias Diakonikolas,et al.  Sample-Optimal Density Estimation in Nearly-Linear Time , 2015, SODA.

[92]  Daniel M. Kane,et al.  Robust Learning of Fixed-Structure Bayesian Networks , 2016, NeurIPS.

[93]  Jerry Li,et al.  Robustly Learning a Gaussian: Getting Optimal Error, Efficiently , 2017, SODA.

[94]  Ankur Moitra,et al.  Algorithmic Aspects of Machine Learning , 2018 .