论文信息 - Statistical Query Lower Bounds for Robust Estimation of High-Dimensional Gaussians and Gaussian Mixtures

Statistical Query Lower Bounds for Robust Estimation of High-Dimensional Gaussians and Gaussian Mixtures

We describe a general technique that yields the first Statistical Query lower bounds} fora range of fundamental high-dimensional learning problems involving Gaussian distributions. Our main results are for the problems of (1) learning Gaussian mixture models (GMMs), and (2) robust (agnostic) learning of a single unknown Gaussian distribution. For each of these problems, we show a super-polynomial gap} between the (information-theoretic)sample complexity and the computational complexity of any} Statistical Query algorithm for the problem. Statistical Query (SQ) algorithms are a class of algorithms that are only allowed to query expectations of functions of the distribution rather than directly access samples. This class of algorithms is quite broad: a wide range of known algorithmic techniques in machine learning are known to be implementable using SQs.Moreover, for the unsupervised learning problems studied in this paper, all known algorithms with non-trivial performance guarantees are SQ or are easily implementable using SQs. Our SQ lower bound for Problem (1)is qualitatively matched by known learning algorithms for GMMs. At a conceptual level, this result implies that – as far as SQ algorithms are concerned – the computational complexity of learning GMMs is inherently exponential in the dimension of the latent space} – even though there is no such information-theoretic barrier. Our lower bound for Problem (2) implies that the accuracy of the robust learning algorithm in \cite{DiakonikolasKKLMS16} is essentially best possible among all polynomial-time SQ algorithms. On the positive side, we also give a new (SQ) learning algorithm for Problem (2) achievingthe information-theoretically optimal accuracy, up to a constant factor, whose running time essentially matches our lower bound. Our algorithm relies on a filtering technique generalizing \cite{DiakonikolasKKLMS16} that removes outliers based on higher-order tensors.Our SQ lower bounds are attained via a unified moment-matching technique that is useful in other contexts and may be of broader interest. Our technique yields nearly-tight lower bounds for a number of related unsupervised estimation problems. Specifically, for the problems of (3) robust covariance estimation in spectral norm, and (4) robust sparse mean estimation, we establish a quadratic statistical–computational tradeoff} for SQ algorithms, matching known upper bounds. Finally, our technique can be used to obtain tight sample complexitylower bounds for high-dimensional testing} problems. Specifically, for the classical problem of robustly testing} an unknown mean (known covariance) Gaussian, our technique implies an information-theoretic sample lower bound that scales linearly} in the dimension. Our sample lower bound matches the sample complexity of the corresponding robust learning} problem and separates the sample complexity of robust testing from standard (non-robust) testing. This separation is surprising because such a gap does not exist for the corresponding learning problem.

[1] H. Hotelling. The Generalization of Student’s Ratio , 1931 .

[2] E. S. Pearson,et al. On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[3] J. Kalbfleisch. Statistical Inference Under Order Restrictions , 1975 .

[4] J. Tukey. Mathematics and the Picturing of Data , 1975 .

[5] Frederick R. Forst,et al. On robust estimation of the location parameter , 1980 .

[6] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.

[7] D. Ruppert. Robust Statistics: The Approach Based on Influence Functions , 1987 .

[8] L. Devroye,et al. Nonparametric density estimation : the L[1] view , 1987 .

[9] D. W. Scott,et al. Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[10] D. Donoho,et al. Breakdown Properties of Location Estimates Based on Halfspace Depth and Projected Outlyingness , 1992 .

[11] Michael Kearns,et al. Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[12] Yishay Mansour,et al. Weakly learning DNF and characterizing statistical query learning using Fourier analysis , 1994, STOC '94.

[13] Ronitt Rubinfeld,et al. On the learnability of discrete distributions , 1994, STOC '94.

[14] R. Wilcox. Introduction to Robust Estimation and Hypothesis Testing , 1997 .

[15] Sanjoy Dasgupta,et al. Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[16] Z. Bai,et al. EFFECT OF HIGH DIMENSION: BY AN EXAMPLE OF A TWO SAMPLE PROBLEM , 1999 .

[17] Yishay Mansour,et al. Estimating a mixture of two product distributions , 1999, COLT '99.

[18] Ronitt Rubinfeld,et al. Testing that distributions are close , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[19] Sanjeev Arora,et al. Learning mixtures of arbitrary gaussians , 2001, STOC '01.

[20] Luc Devroye,et al. Combinatorial methods in density estimation , 2001, Springer series in statistics.

[21] Paul W. Goldberg,et al. Evolutionary Trees Can be Learned in Polynomial Time in the Two-State General Markov Model , 2001, SIAM J. Comput..

[22] Santosh S. Vempala,et al. A spectral algorithm for learning mixtures of distributions , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[23] S. Sheather. Density Estimation , 2004 .

[24] B. Ripley,et al. Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[25] Dimitris Achlioptas,et al. On Spectral Learning of Mixtures of Distributions , 2005, COLT.

[26] Julia Kastner,et al. Introduction to Robust Estimation and Hypothesis Testing , 2005 .

[27] Elchanan Mossel,et al. Learning nonsingular phylogenies and hidden Markov models , 2005, STOC '05.

[28] Stephen E. Fienberg,et al. Testing Statistical Hypotheses , 2005 .

[29] Rocco A. Servedio,et al. Agnostically learning halfspaces , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[30] Jon Feldman,et al. Learning mixtures of product distributions over discrete domains , 2005, FOCS.

[31] Rene F. Swarttouw,et al. Orthogonal polynomials , 2020, NIST Handbook of Mathematical Functions.

[32] J. Feldman,et al. PAC Learning Mixtures of Gaussians with No Separation Assumption , 2006 .

[33] Kunle Olukotun,et al. Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[34] Vitaly Feldman,et al. New Results for Learning Noisy Parities and Halfspaces , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[35] Alexander A. Sherstov,et al. Cryptographic Hardness for Learning Intersections of Halfspaces , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[36] Seshadhri Comandur,et al. Testing Expansion in Bounded Degree Graphs , 2007, Electron. Colloquium Comput. Complex..

[37] Santosh S. Vempala,et al. The Spectral Method for General Mixture Models , 2008, SIAM J. Comput..

[38] Santosh S. Vempala,et al. Isotropic PCA and Affine-Invariant Clustering , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[39] M. Srivastava,et al. A test for the mean vector with fewer observations than the dimension , 2008 .

[40] Vitaly Feldman,et al. Statistical Query Learning , 2008, Encyclopedia of Algorithms.

[41] I. Johnstone,et al. On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[42] R. Vershynin. How Close is the Sample Covariance Matrix to the Actual Covariance Matrix? , 2010, 1004.3484.

[43] Adam Tauman Kalai,et al. Efficiently learning mixtures of two Gaussians , 2010, STOC '10.

[44] Song-xi Chen,et al. A two-sample test for high-dimensional data with applications to gene-set testing , 2010, 1002.4547.

[45] Ankur Moitra,et al. Settling the Polynomial Learnability of Mixtures of Gaussians , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[46] Mikhail Belkin,et al. Polynomial Learning of Distribution Families , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[47] P. Rigollet,et al. Optimal detection of sparse principal components in high dimension , 2012, 1202.5070.

[48] T. Sanders,et al. Analysis of Boolean Functions , 2012, ArXiv.

[49] T. Cai,et al. Sparse PCA: Optimal rates and adaptive estimation , 2012, 1211.1309.

[50] Jianqing Fan,et al. Distributions of angles in random packing on spheres , 2013, J. Mach. Learn. Res..

[51] Philippe Rigollet,et al. Complexity Theoretic Lower Bounds for Sparse Principal Component Detection , 2013, COLT.

[52] Sham M. Kakade,et al. Learning mixtures of spherical gaussians: moment methods and spectral decompositions , 2012, ITCS '13.

[53] Ryan O'Donnell,et al. Learning Sums of Independent Integer Random Variables , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[54] Rocco A. Servedio,et al. Learning mixtures of structured distributions over discrete domains , 2012, SODA.

[55] Rocco A. Servedio,et al. Learning k-Modal Distributions via Testing , 2012, Theory Comput..

[56] Martin J. Wainwright,et al. Lower bounds on the performance of polynomial-time algorithms for sparse linear regression , 2014, COLT.

[57] Pravesh Kothari,et al. Embedding Hard Learning Problems into Gaussian Space , 2014, Electron. Colloquium Comput. Complex..

[58] Ryan O'Donnell,et al. Analysis of Boolean Functions , 2014, ArXiv.

[59] Constantinos Daskalakis,et al. Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians , 2013, COLT.

[60] Rocco A. Servedio,et al. Explorer Efficient Density Estimation via Piecewise Polynomial Approximation , 2013 .

[61] Aditya Bhaskara,et al. Smoothed analysis of tensor decompositions , 2013, STOC.

[62] Alon Orlitsky,et al. Near-Optimal-Sample Estimators for Spherical Gaussian Mixtures , 2014, NIPS.

[63] Nathan Linial,et al. From average case complexity to improper learning complexity , 2013, STOC.

[64] Santosh S. Vempala,et al. University of Birmingham On the Complexity of Random Satisfiability Problems with Planted Solutions , 2018 .

[65] Mikhail Belkin,et al. The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures , 2013, COLT.

[66] Rocco A. Servedio,et al. Near-Optimal Density Estimation in Near-Linear Time Using Variable-Width Histograms , 2014, NIPS.

[67] Quentin Berthet,et al. Statistical and computational trade-offs in estimation of sparse principal components , 2014, 1408.5369.

[68] Santosh S. Vempala,et al. Statistical Query Algorithms for Stochastic Convex Optimization , 2015, ArXiv.

[69] Qingqing Huang,et al. Learning Mixtures of Gaussians in High Dimensions , 2015, STOC.

[70] Rocco A. Servedio,et al. Learning from satisfying assignments , 2015, SODA.

[71] Ludwig Schmidt,et al. A Nearly Optimal and Agnostic Algorithm for Properly Learning a Mixture of k Gaussians, for any Constant k , 2015, ArXiv.

[72] Avi Wigderson,et al. Sum-of-Squares Lower Bounds for Sparse PCA , 2015, NIPS.

[73] Chao Gao,et al. Robust Covariance Matrix Estimation via Matrix Depth , 2015 .

[74] Rocco A. Servedio,et al. Learning Poisson Binomial Distributions , 2011, STOC '12.

[75] T. Cai,et al. Optimal estimation and rank detection for sparse spiked covariance matrices , 2013, Probability theory and related fields.

[76] Moritz Hardt,et al. Tight Bounds for Learning a Mixture of Two Gaussians , 2014, STOC.

[77] Daniel M. Kane,et al. Optimal Learning via the Fourier Transform for Sums of Independent Integer Random Variables , 2015, COLT.

[78] Santosh S. Vempala,et al. Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[79] Ming-Yang Kao,et al. Encyclopedia of Algorithms , 2016, Springer New York.

[80] Amit Daniely,et al. Complexity theoretic limitations on learning halfspaces , 2015, STOC.

[81] Santosh S. Vempala,et al. Beyond Spectral: Tight Bounds for Planted Gaussians , 2016, ArXiv.

[82] Anindya De,et al. A size-free CLT for poisson multinomials and its applications , 2015, STOC.

[83] Daniel M. Kane,et al. Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[84] Daniel M. Kane,et al. The fourier transform of poisson multinomial distributions and its algorithmic applications , 2015, STOC.

[85] Ilias Diakonikolas,et al. Learning Structured Distributions , 2016, Handbook of Big Data.

[86] Jerry Li,et al. Being Robust (in High Dimensions) Can Be Practical , 2017, ICML.

[87] Santosh S. Vempala,et al. Statistical Algorithms and a Lower Bound for Detecting Planted Cliques , 2012, J. ACM.

[88] Jerry Li,et al. Robust Sparse Estimation Tasks in High Dimensions , 2017, ArXiv.

[89] Vitaly Feldman,et al. A General Characterization of the Statistical Query Complexity , 2016, COLT.

[90] Sivaraman Balakrishnan,et al. Computationally Efficient Robust Estimation of Sparse Functionals , 2017, ArXiv.

[91] Ilias Diakonikolas,et al. Sample-Optimal Density Estimation in Nearly-Linear Time , 2015, SODA.

[92] Daniel M. Kane,et al. Robust Learning of Fixed-Structure Bayesian Networks , 2016, NeurIPS.

[93] Jerry Li,et al. Robustly Learning a Gaussian: Getting Optimal Error, Efficiently , 2017, SODA.

[94] Ankur Moitra,et al. Algorithmic Aspects of Machine Learning , 2018 .