High Dimensional Robust Estimation of Sparse Models via Trimmed Hard Thresholding

We study the problem of sparsity constrained M -estimation with arbitrary corruptions to both explanatory and response variables in the high-dimensional regime, where the number of variables d is larger than the sample size n. Our main contribution is a highly efficient gradient-based optimization algorithm that we call Trimmed Hard Thresholding – a robust variant of Iterative Hard Thresholding (IHT) by using trimmed mean in gradient computations. Our algorithm can deal with a wide class of sparsity constrained M -estimation problems, and we can tolerate a nearly dimension independent fraction of arbitrarily corrupted samples. More specifically, when the corrupted fraction satisfies . 1/ (√ k log(nd) ) , where k is the sparsity of the parameter, we obtain accurate estimation and model selection guarantees with optimal sample complexity. Furthermore, we extend our algorithm to sparse Gaussian graphical model (precision matrix) estimation via a neighborhood selection approach. We demonstrate the effectiveness of robust estimation in sparse linear, logistic regression, and sparse precision matrix estimation on synthetic and real-world US equities data.

[1]  B. King Market and Industry Factors in Stock Price Behavior , 1966 .

[2]  J. Tukey Mathematics and the Picturing of Data , 1975 .

[3]  Franco P. Preparata,et al.  The Densest Hemisphere Problem , 1978, Theor. Comput. Sci..

[4]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[5]  P. Rousseeuw Least Median of Squares Regression , 1984 .

[6]  John Law,et al.  Robust Statistics—The Approach Based on Influence Functions , 1986 .

[7]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[8]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[9]  A. V. D. Vaart,et al.  Asymptotic Statistics: U -Statistics , 1998 .

[10]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[11]  C. Li,et al.  Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[12]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[13]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[14]  Bin Yu,et al.  High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence , 2008, 0811.3628.

[15]  Mike E. Davies,et al.  Iterative Hard Thresholding for Compressed Sensing , 2008, ArXiv.

[16]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[17]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[18]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[19]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[20]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[21]  Martin J. Wainwright,et al.  Restricted Eigenvalue Properties for Correlated Gaussian Designs , 2010, J. Mach. Learn. Res..

[22]  Martin J. Wainwright,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2009, IEEE Transactions on Information Theory.

[23]  Martin J. Wainwright,et al.  Fast global convergence of gradient methods for high-dimensional statistical recovery , 2011, ArXiv.

[24]  Xiaodong Li,et al.  Compressed Sensing and Matrix Completion with Constant Proportion of Corruptions , 2011, Constructive Approximation.

[25]  T. Cai,et al.  A Constrained ℓ1 Minimization Approach to Sparse Precision Matrix Estimation , 2011, 1102.2233.

[26]  Larry A. Wasserman,et al.  High Dimensional Semiparametric Gaussian Copula Graphical Models. , 2012, ICML 2012.

[27]  Larry A. Wasserman,et al.  The huge Package for High-dimensional Undirected Graph Estimation in R , 2012, J. Mach. Learn. Res..

[28]  Larry A. Wasserman,et al.  The Nonparanormal SKEPTIC , 2012, ICML 2012.

[29]  Shie Mannor,et al.  Robust Sparse Regression under Adversarial Corruption , 2013, ICML.

[30]  Christophe Croux,et al.  Sparse least trimmed squares regression for analyzing high-dimensional large data sets , 2013, 1304.4773.

[31]  Martin J. Wainwright,et al.  Lower bounds on the performance of polynomial-time algorithms for sparse linear regression , 2014, COLT.

[32]  Prateek Jain,et al.  On Iterative Hard Thresholding Methods for High-dimensional M-Estimation , 2014, NIPS.

[33]  Martin J. Wainwright,et al.  Statistical guarantees for the EM algorithm: From population to sample-based analysis , 2014, ArXiv.

[34]  秀俊 松井,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2014 .

[35]  Po-Ling Loh,et al.  Statistical consistency and asymptotic normality for high-dimensional robust M-estimators , 2015, ArXiv.

[36]  Po-Ling Loh,et al.  High-dimensional robust precision matrix estimation: Cellwise corruption under $\epsilon$-contamination , 2015, 1509.07229.

[37]  Prateek Jain,et al.  Robust Regression via Hard Thresholding , 2015, NIPS.

[38]  Eunho Yang,et al.  Robust Gaussian Graphical Modeling with the Trimmed Graphical Lasso , 2015, NIPS.

[39]  Pradeep Ravikumar,et al.  Fast Classification Rates for High-dimensional Gaussian Generative Models , 2015, NIPS.

[40]  Volkan Cevher,et al.  Sparsistency of 1-Regularized M-Estimators , 2015, AISTATS.

[41]  Eunho Yang,et al.  High-Dimensional Trimmed Estimators: A General Framework for Robust Structured Estimation , 2016, 1605.08299.

[42]  Santosh S. Vempala,et al.  Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[43]  C. O’Brien Statistical Learning with Sparsity: The Lasso and Generalizations , 2016 .

[44]  Tong Zhang,et al.  Gradient Hard Thresholding Pursuit , 2018, J. Mach. Learn. Res..

[45]  Prateek Jain,et al.  Consistent Robust Regression , 2017, NIPS.

[46]  Jerry Li,et al.  Computationally Efficient Robust Sparse Estimation in High Dimensions , 2017, COLT.

[47]  Lingxiao Wang,et al.  Robust Gaussian Graphical Model Estimation with Arbitrary Corruption , 2017, ICML.

[48]  Shie Mannor,et al.  Ignoring Is a Bliss: Learning with Large Noise Through Reweighting-Minimization , 2017, COLT.

[49]  Pradeep Ravikumar,et al.  Minimax Gaussian Classification & Clustering , 2017, AISTATS.

[50]  Ping Li,et al.  A Tight Bound of Hard Thresholding , 2016, J. Mach. Learn. Res..

[51]  Chao Gao Robust regression via mutivariate regression depth , 2017, Bernoulli.

[52]  Lili Su,et al.  Securing Distributed Machine Learning in High Dimensions , 2018, ArXiv.

[53]  Lili Su,et al.  Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent , 2017, Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems.

[54]  Kannan Ramchandran,et al.  Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates , 2018, ICML.

[55]  Gregory Valiant,et al.  Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers , 2017, ITCS.

[56]  Haoyang Liu,et al.  Between hard and soft thresholding: optimal iterative thresholding algorithms , 2018, Information and Inference: A Journal of the IMA.

[57]  Sujay Sanghavi,et al.  Iteratively Learning from the Best , 2018, ArXiv.

[58]  Pravesh Kothari,et al.  Efficient Algorithms for Outlier-Robust Regression , 2018, COLT.

[59]  Chao Gao,et al.  Robust covariance and scatter matrix estimation under Huber’s contamination model , 2015, The Annals of Statistics.

[60]  Sivaraman Balakrishnan,et al.  Robust estimation via robust gradient estimation , 2018, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[61]  Martin J. Wainwright,et al.  High-Dimensional Statistics , 2019 .

[62]  Kazushi Ikeda,et al.  Efficient learning with robust gradient descent , 2017, Machine Learning.

[63]  C. Caramanis,et al.  High Dimensional Robust Sparse Regression , 2018, AISTATS.

[64]  Daniel M. Kane,et al.  Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[65]  Eric Price,et al.  Compressed Sensing with Adversarial Sparse Noise via L1 Regression , 2018, SOSA.

[66]  Kannan Ramchandran,et al.  Defending Against Saddle Point Attack in Byzantine-Robust Distributed Learning , 2018, ICML.

[67]  Jerry Li,et al.  Sever: A Robust Meta-Algorithm for Stochastic Optimization , 2018, ICML.