Efficient Algorithms and Lower Bounds for Robust Linear Regression

We study the problem of high-dimensional linear regression in a robust model where an $\epsilon$-fraction of the samples can be adversarially corrupted. We focus on the fundamental setting where the covariates of the uncorrupted samples are drawn from a Gaussian distribution $\mathcal{N}(0, \Sigma)$ on $\mathbb{R}^d$. We give nearly tight upper bounds and computational lower bounds for this problem. Specifically, our main contributions are as follows: For the case that the covariance matrix is known to be the identity, we give a sample near-optimal and computationally efficient algorithm that outputs a candidate hypothesis vector $\widehat{\beta}$ which approximates the unknown regression vector $\beta$ within $\ell_2$-norm $O(\epsilon \log(1/\epsilon) \sigma)$, where $\sigma$ is the standard deviation of the random observation noise. An error of $\Omega (\epsilon \sigma)$ is information-theoretically necessary, even with infinite sample size. Prior work gave an algorithm for this problem with sample complexity $\tilde{\Omega}(d^2/\epsilon^2)$ whose error guarantee scales with the $\ell_2$-norm of $\beta$. For the case of unknown covariance, we show that we can efficiently achieve the same error guarantee as in the known covariance case using an additional $\tilde{O}(d^2/\epsilon^2)$ unlabeled examples. On the other hand, an error of $O(\epsilon \sigma)$ can be information-theoretically attained with $O(d/\epsilon^2)$ samples. We prove a Statistical Query (SQ) lower bound providing evidence that this quadratic tradeoff in the sample size is inherent. More specifically, we show that any polynomial time SQ learning algorithm for robust linear regression (in Huber's contamination model) with estimation complexity $O(d^{2-c})$, where $c>0$ is an arbitrarily small constant, must incur an error of $\Omega(\sqrt{\epsilon} \sigma)$.

[1]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[2]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[3]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[4]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[5]  Yishay Mansour,et al.  Weakly learning DNF and characterizing statistical query learning using Fourier analysis , 1994, STOC '94.

[6]  Vitaly Feldman,et al.  Statistical Query Learning , 2008, Encyclopedia of Algorithms.

[7]  Rocco A. Servedio,et al.  Learning Halfspaces with Malicious Noise , 2009, ICALP.

[8]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[9]  Santosh S. Vempala,et al.  University of Birmingham On the Complexity of Random Satisfiability Problems with Planted Solutions , 2018 .

[10]  Santosh S. Vempala,et al.  Statistical Query Algorithms for Stochastic Convex Optimization , 2015, ArXiv.

[11]  Prateek Jain,et al.  Robust Regression via Hard Thresholding , 2015, NIPS.

[12]  Santosh S. Vempala,et al.  Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[13]  Pravesh Kothari,et al.  Outlier-robust moment-estimation via sum-of-squares , 2017, ArXiv.

[14]  Jerry Li,et al.  Being Robust (in High Dimensions) Can Be Practical , 2017, ICML.

[15]  Maria-Florina Balcan,et al.  The Power of Localization for Efficiently Learning Linear Separators with Noise , 2013, J. ACM.

[16]  Prateek Jain,et al.  Consistent Robust Regression , 2017, NIPS.

[17]  Jerry Li,et al.  Computationally Efficient Robust Sparse Estimation in High Dimensions , 2017, COLT.

[18]  Santosh S. Vempala,et al.  Statistical Algorithms and a Lower Bound for Detecting Planted Cliques , 2012, J. ACM.

[19]  Gregory Valiant,et al.  Learning from untrusted data , 2016, STOC.

[20]  Pravesh Kothari,et al.  Better Agnostic Clustering Via Relaxed Tensor Norms , 2017, ArXiv.

[21]  Daniel M. Kane,et al.  Statistical Query Lower Bounds for Robust Estimation of High-Dimensional Gaussians and Gaussian Mixtures , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[22]  Vitaly Feldman,et al.  A General Characterization of the Statistical Query Complexity , 2016, COLT.

[23]  Chao Gao Robust regression via mutivariate regression depth , 2017, Bernoulli.

[24]  Daniel M. Kane,et al.  Robust Learning of Fixed-Structure Bayesian Networks , 2016, NeurIPS.

[25]  Jerry Li,et al.  Robustly Learning a Gaussian: Getting Optimal Error, Efficiently , 2017, SODA.

[26]  Daniel M. Kane,et al.  Learning geometric concepts with nasty noise , 2017, STOC.

[27]  Jerry Li,et al.  Mixture models, robustness, and sum of squares proofs , 2017, STOC.

[28]  Daniel M. Kane,et al.  List-decodable robust mean estimation and learning mixtures of spherical gaussians , 2017, STOC.

[29]  Pravesh Kothari,et al.  Efficient Algorithms for Outlier-Robust Regression , 2018, COLT.

[30]  Sivaraman Balakrishnan,et al.  Robust estimation via robust gradient estimation , 2018, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[31]  Daniel M. Kane,et al.  Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[32]  Jerry Li,et al.  Sever: A Robust Meta-Algorithm for Stochastic Optimization , 2018, ICML.