论文信息 - List-Decodable Linear Regression

List-Decodable Linear Regression

We give the first polynomial-time algorithm for robust regression in the list-decodable setting where an adversary can corrupt a greater than $1/2$ fraction of examples. For any $\alpha < 1$, our algorithm takes as input a sample $\{(x_i,y_i)\}_{i \leq n}$ of $n$ linear equations where $\alpha n$ of the equations satisfy $y_i = \langle x_i,\ell^*\rangle +\zeta$ for some small noise $\zeta$ and $(1-\alpha)n$ of the equations are {\em arbitrarily} chosen. It outputs a list $L$ of size $O(1/\alpha)$ - a fixed constant - that contains an $\ell$ that is close to $\ell^*$. Our algorithm succeeds whenever the inliers are chosen from a \emph{certifiably} anti-concentrated distribution $D$. In particular, this gives a $(d/\alpha)^{O(1/\alpha^8)}$ time algorithm to find a $O(1/\alpha)$ size list when the inlier distribution is standard Gaussian. For discrete product distributions that are anti-concentrated only in \emph{regular} directions, we give an algorithm that achieves similar guarantee under the promise that $\ell^*$ has all coordinates of the same magnitude. To complement our result, we prove that the anti-concentration assumption on the inliers is information-theoretically necessary. Our algorithm is based on a new framework for list-decodable learning that strengthens the `identifiability to algorithms' paradigm based on the sum-of-squares method. In an independent and concurrent work, Raghavendra and Yau also used the Sum-of-Squares method to give a similar result for list-decodable regression.

Adam R. Klivans | Pravesh K. Kothari | Sushrut Karmalkar | Pravesh Kothari | Sushrut Karmalkar

[1] J. Lasserre. New Positive Semidefinite Relaxations for Nonconvex Quadratic Programs , 2001 .

[2] J. A. Salvato. John wiley & sons. , 1994, Environmental science & technology.

[3] Pravesh Kothari,et al. Outlier-robust moment-estimation via sum-of-squares , 2017, ArXiv.

[4] Martin Grötschel,et al. The ellipsoid method and its consequences in combinatorial optimization , 1981, Comb..

[5] M. Rudelson,et al. The Littlewood-Offord problem and invertibility of random matrices , 2007, math/0703503.

[6] John Law,et al. Robust Statistics—The Approach Based on Influence Functions , 1986 .

[7] M. Laurent. Sums of Squares, Moment Matrices and Optimization Over Polynomials , 2009 .

[8] David Steurer,et al. Dictionary Learning and Tensor Decomposition via the Sum-of-Squares Method , 2014, STOC.

[9] Jerry Li,et al. Mixture models, robustness, and sum of squares proofs , 2017, STOC.

[10] Constantine Caramanis,et al. A Convex Formulation for Mixed Regression: Near Optimal Rates in the Face of Noise , 2013, ArXiv.

[11] Martin J. Wainwright,et al. Statistical guarantees for the EM algorithm: From population to sample-based analysis , 2014, ArXiv.

[12] Robert A. Jacobs,et al. Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[13] Prateek Jain,et al. Consistent Robust Regression , 2017, NIPS.

[14] Daniel M. Kane,et al. Learning geometric concepts with nasty noise , 2017, STOC.

[15] R. D. Veaux,et al. Mixtures of linear regressions , 1989 .

[16] Daniel M. Kane,et al. Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[17] J. Gallier. Quadratic Optimization Problems , 2020, Linear Algebra and Optimization with Applications to Machine Learning.

[18] Rocco A. Servedio,et al. Bounded Independence Fools Halfspaces , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[19] P. Erdös. On a lemma of Littlewood and Offord , 1945 .

[20] Sivaraman Balakrishnan,et al. Robust estimation via robust gradient estimation , 2018, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[21] Rocco A. Servedio,et al. Learning Halfspaces with Malicious Noise , 2009, ICALP.

[22] Terence Tao,et al. The Littlewood-Offord problem in high dimensions and a conjecture of Frankl and Füredi , 2010, Comb..

[23] D. Lubinsky. A Survey of Weighted Approximation for Exponential Weights , 2007, math/0701099.

[24] Ilias Diakonikolas,et al. Efficient Algorithms and Lower Bounds for Robust Linear Regression , 2018, SODA.

[25] Yu Cheng,et al. High-Dimensional Robust Mean Estimation in Nearly-Linear Time , 2018, SODA.

[26] Holger Sambale,et al. Concentration inequalities for polynomials in α-sub-exponential random variables , 2019, Electronic Journal of Probability.

[27] Tengyu Ma,et al. Polynomial-Time Tensor Decompositions with Sum-of-Squares , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[28] Santosh S. Vempala,et al. Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[29] Eric Price,et al. Compressed Sensing with Adversarial Sparse Noise via L1 Regression , 2018, SOSA.

[30] Prasad Raghavendra,et al. List Decodable Learning via Sum of Squares , 2019, SODA.

[31] Ankur Moitra,et al. Noisy tensor completion via the sum-of-squares hierarchy , 2015, Mathematical Programming.

[32] Constantine Caramanis,et al. Alternating Minimization for Mixed Linear Regression , 2013, ICML.

[33] K. Roberts,et al. Thesis , 2002 .

[34] Jerry Li,et al. Robustly Learning a Gaussian: Getting Optimal Error, Efficiently , 2017, SODA.

[35] Ankur Moitra,et al. Settling the Polynomial Learnability of Mixtures of Gaussians , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[36] Yurii Nesterov,et al. Squared Functional Systems and Optimization Problems , 2000 .

[37] Santosh S. Vempala,et al. A discriminative framework for clustering via similarity functions , 2008, STOC.

[38] Anima Anandkumar,et al. Provable Tensor Methods for Learning Mixtures of Generalized Linear Models , 2014, AISTATS.

[39] Inderjit S. Dhillon,et al. Mixed Linear Regression with Multiple Components , 2016, NIPS.