List-Decodable Linear Regression

We give the first polynomial-time algorithm for robust regression in the list-decodable setting where an adversary can corrupt a greater than $1/2$ fraction of examples. For any $\alpha < 1$, our algorithm takes as input a sample $\{(x_i,y_i)\}_{i \leq n}$ of $n$ linear equations where $\alpha n$ of the equations satisfy $y_i = \langle x_i,\ell^*\rangle +\zeta$ for some small noise $\zeta$ and $(1-\alpha)n$ of the equations are {\em arbitrarily} chosen. It outputs a list $L$ of size $O(1/\alpha)$ - a fixed constant - that contains an $\ell$ that is close to $\ell^*$. Our algorithm succeeds whenever the inliers are chosen from a \emph{certifiably} anti-concentrated distribution $D$. In particular, this gives a $(d/\alpha)^{O(1/\alpha^8)}$ time algorithm to find a $O(1/\alpha)$ size list when the inlier distribution is standard Gaussian. For discrete product distributions that are anti-concentrated only in \emph{regular} directions, we give an algorithm that achieves similar guarantee under the promise that $\ell^*$ has all coordinates of the same magnitude. To complement our result, we prove that the anti-concentration assumption on the inliers is information-theoretically necessary. Our algorithm is based on a new framework for list-decodable learning that strengthens the `identifiability to algorithms' paradigm based on the sum-of-squares method. In an independent and concurrent work, Raghavendra and Yau also used the Sum-of-Squares method to give a similar result for list-decodable regression.

[1]  J. Lasserre New Positive Semidefinite Relaxations for Nonconvex Quadratic Programs , 2001 .

[2]  J. A. Salvato John wiley & sons. , 1994, Environmental science & technology.

[3]  Pravesh Kothari,et al.  Outlier-robust moment-estimation via sum-of-squares , 2017, ArXiv.

[4]  Martin Grötschel,et al.  The ellipsoid method and its consequences in combinatorial optimization , 1981, Comb..

[5]  M. Rudelson,et al.  The Littlewood-Offord problem and invertibility of random matrices , 2007, math/0703503.

[6]  John Law,et al.  Robust Statistics—The Approach Based on Influence Functions , 1986 .

[7]  M. Laurent Sums of Squares, Moment Matrices and Optimization Over Polynomials , 2009 .

[8]  David Steurer,et al.  Dictionary Learning and Tensor Decomposition via the Sum-of-Squares Method , 2014, STOC.

[9]  Jerry Li,et al.  Mixture models, robustness, and sum of squares proofs , 2017, STOC.

[10]  Constantine Caramanis,et al.  A Convex Formulation for Mixed Regression: Near Optimal Rates in the Face of Noise , 2013, ArXiv.

[11]  Martin J. Wainwright,et al.  Statistical guarantees for the EM algorithm: From population to sample-based analysis , 2014, ArXiv.

[12]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[13]  Prateek Jain,et al.  Consistent Robust Regression , 2017, NIPS.

[14]  Daniel M. Kane,et al.  Learning geometric concepts with nasty noise , 2017, STOC.

[15]  R. D. Veaux,et al.  Mixtures of linear regressions , 1989 .

[16]  Daniel M. Kane,et al.  Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[17]  J. Gallier Quadratic Optimization Problems , 2020, Linear Algebra and Optimization with Applications to Machine Learning.

[18]  Rocco A. Servedio,et al.  Bounded Independence Fools Halfspaces , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[19]  P. Erdös On a lemma of Littlewood and Offord , 1945 .

[20]  Sivaraman Balakrishnan,et al.  Robust estimation via robust gradient estimation , 2018, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[21]  Rocco A. Servedio,et al.  Learning Halfspaces with Malicious Noise , 2009, ICALP.

[22]  Terence Tao,et al.  The Littlewood-Offord problem in high dimensions and a conjecture of Frankl and Füredi , 2010, Comb..

[23]  D. Lubinsky A Survey of Weighted Approximation for Exponential Weights , 2007, math/0701099.

[24]  Ilias Diakonikolas,et al.  Efficient Algorithms and Lower Bounds for Robust Linear Regression , 2018, SODA.

[25]  Yu Cheng,et al.  High-Dimensional Robust Mean Estimation in Nearly-Linear Time , 2018, SODA.

[26]  Holger Sambale,et al.  Concentration inequalities for polynomials in α-sub-exponential random variables , 2019, Electronic Journal of Probability.

[27]  Tengyu Ma,et al.  Polynomial-Time Tensor Decompositions with Sum-of-Squares , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[28]  Santosh S. Vempala,et al.  Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[29]  Eric Price,et al.  Compressed Sensing with Adversarial Sparse Noise via L1 Regression , 2018, SOSA.

[30]  Prasad Raghavendra,et al.  List Decodable Learning via Sum of Squares , 2019, SODA.

[31]  Ankur Moitra,et al.  Noisy tensor completion via the sum-of-squares hierarchy , 2015, Mathematical Programming.

[32]  Constantine Caramanis,et al.  Alternating Minimization for Mixed Linear Regression , 2013, ICML.

[33]  K. Roberts,et al.  Thesis , 2002 .

[34]  Jerry Li,et al.  Robustly Learning a Gaussian: Getting Optimal Error, Efficiently , 2017, SODA.

[35]  Ankur Moitra,et al.  Settling the Polynomial Learnability of Mixtures of Gaussians , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[36]  Yurii Nesterov,et al.  Squared Functional Systems and Optimization Problems , 2000 .

[37]  Santosh S. Vempala,et al.  A discriminative framework for clustering via similarity functions , 2008, STOC.

[38]  Anima Anandkumar,et al.  Provable Tensor Methods for Learning Mixtures of Generalized Linear Models , 2014, AISTATS.

[39]  Inderjit S. Dhillon,et al.  Mixed Linear Regression with Multiple Components , 2016, NIPS.