High Dimensional Robust Sparse Regression

We provide a novel -- and to the best of our knowledge, the first -- algorithm for high dimensional sparse regression with constant fraction of corruptions in explanatory and/or response variables. Our algorithm recovers the true sparse parameters with sub-linear sample complexity, in the presence of a constant fraction of arbitrary corruptions. Our main contribution is a robust variant of Iterative Hard Thresholding. Using this, we provide accurate estimators: when the covariance matrix in sparse regression is identity, our error guarantee is near information-theoretically optimal. We then deal with robust sparse regression with unknown structured covariance matrix. We propose a filtering algorithm which consists of a novel randomized outlier removal technique for robust sparse mean estimation that may be of interest in its own right: the filtering algorithm is flexible enough to deal with unknown covariance. Also, it is orderwise more efficient computationally than the ellipsoid algorithm. Using sub-linear sample complexity, our algorithm achieves the best known (and first) error guarantee. We demonstrate the effectiveness on large-scale sparse regression problems with arbitrary corruptions.

[1]  Martin J. Wainwright,et al.  High-Dimensional Statistics , 2019 .

[2]  Daniel M. Kane,et al.  Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[3]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[4]  Ilias Diakonikolas,et al.  Efficient Algorithms and Lower Bounds for Robust Linear Regression , 2018, SODA.

[5]  Lili Su,et al.  Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent , 2019, PERV.

[6]  Alexandre d'Aspremont,et al.  Optimal Solutions for Sparse Principal Component Analysis , 2007, J. Mach. Learn. Res..

[7]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[8]  Shie Mannor,et al.  Robust Sparse Regression under Adversarial Corruption , 2013, ICML.

[9]  Mike E. Davies,et al.  Iterative Hard Thresholding for Compressed Sensing , 2008, ArXiv.

[10]  P. J. Huber Robust Estimation of a Location Parameter , 1964 .

[11]  Jerry Li,et al.  Sever: A Robust Meta-Algorithm for Stochastic Optimization , 2018, ICML.

[12]  Jing Lei,et al.  Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA , 2013, NIPS.

[13]  Shie Mannor,et al.  Outlier-Robust PCA: The High-Dimensional Case , 2013, IEEE Transactions on Information Theory.

[14]  Franco P. Preparata,et al.  The Densest Hemisphere Problem , 1978, Theor. Comput. Sci..

[15]  Xiaodong Li,et al.  Compressed Sensing and Matrix Completion with Constant Proportion of Corruptions , 2011, Constructive Approximation.

[16]  Constantine Caramanis,et al.  High dimensional robust M-estimation : arbitrary corruption and heavy tails , 2021 .

[17]  Daniel M. Kane,et al.  Statistical Query Lower Bounds for Robust Estimation of High-Dimensional Gaussians and Gaussian Mixtures , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[18]  Chao Gao Robust regression via mutivariate regression depth , 2017, Bernoulli.

[19]  Liu Liu,et al.  High Dimensional Robust Estimation of Sparse Models via Trimmed Hard Thresholding , 2019, ArXiv.

[20]  Dustin G. Mixon,et al.  Certifying the Restricted Isometry Property is Hard , 2012, IEEE Transactions on Information Theory.

[21]  Prateek Jain,et al.  Consistent Robust Regression , 2017, NIPS.

[22]  Martin J. Wainwright,et al.  Lower bounds on the performance of polynomial-time algorithms for sparse linear regression , 2014, COLT.

[23]  Prateek Jain,et al.  Robust Regression via Hard Thresholding , 2015, NIPS.

[24]  Trac D. Tran,et al.  Exact Recoverability From Dense Corrupted Observations via $\ell _{1}$-Minimization , 2011, IEEE Transactions on Information Theory.

[25]  Kazushi Ikeda,et al.  Efficient learning with robust gradient descent , 2017, Machine Learning.

[26]  Tselil Schramm,et al.  Fast spectral algorithms from sum-of-squares proofs: tensor decomposition and planted sparse vectors , 2015, STOC.

[27]  C. Jennison,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[28]  Jerry Li,et al.  Computationally Efficient Robust Sparse Estimation in High Dimensions , 2017, COLT.

[29]  Sivaraman Balakrishnan,et al.  Robust estimation via robust gradient estimation , 2018, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[30]  Rocco A. Servedio,et al.  Learning Halfspaces with Malicious Noise , 2009, ICALP.

[31]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[32]  Noureddine El Karoui,et al.  Operator norm consistent estimation of large-dimensional sparse covariance matrices , 2008, 0901.3220.

[33]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[34]  Haoyang Liu,et al.  Between hard and soft thresholding: optimal iterative thresholding algorithms , 2018, Information and Inference: A Journal of the IMA.

[35]  P. Bickel,et al.  Covariance regularization by thresholding , 2009, 0901.3079.

[36]  Philippe Rigollet,et al.  Complexity Theoretic Lower Bounds for Sparse Principal Component Detection , 2013, COLT.

[37]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[38]  Ping Li,et al.  A Tight Bound of Hard Thresholding , 2016, J. Mach. Learn. Res..

[39]  Santosh S. Vempala,et al.  Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[40]  Constantine Caramanis,et al.  Robust PCA via Outlier Pursuit , 2010, IEEE Transactions on Information Theory.

[41]  Prateek Jain,et al.  On Iterative Hard Thresholding Methods for High-dimensional M-Estimation , 2014, NIPS.

[42]  Eric Price,et al.  Compressed Sensing with Adversarial Sparse Noise via L1 Regression , 2018, SOSA.

[43]  S. Frick,et al.  Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[44]  Martin J. Wainwright,et al.  Statistical guarantees for the EM algorithm: From population to sample-based analysis , 2014, ArXiv.

[45]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[46]  Jerry Li,et al.  Being Robust (in High Dimensions) Can Be Practical , 2017, ICML.

[47]  Pravesh Kothari,et al.  Efficient Algorithms for Outlier-Robust Regression , 2018, COLT.

[48]  Chao Gao,et al.  Robust covariance and scatter matrix estimation under Huber’s contamination model , 2015, The Annals of Statistics.

[49]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .