Simultaneous feature selection and outlier detection with optimality guarantees

Sparse estimation methods capable of tolerating outliers have been broadly investigated in the last decade. We contribute to this research considering high-dimensional regression problems contaminated by multiple mean-shift outliers which affect both the response and the design matrix. We develop a general framework for this class of problems and propose the use of mixed-integer programming to simultaneously perform feature selection and outlier detection with provably optimal guarantees. We characterize the theoretical properties of our approach, i.e. a necessary and sufficient condition for the robustly strong oracle property, which allows the number of features to exponentially increase with the sample size; the optimal estimation of the parameters; and the breakdown point of the resulting estimates. Moreover, we provide computationally efficient procedures to tune integer constraints and to warm-start the algorithm. We show the superior performance of our proposal compared to existing heuristic methods through numerical simulations and an application investigating the relationships between the human microbiome and childhood obesity.

[1]  Tong Zhang,et al.  A General Theory of Concave Regularization for High-Dimensional Sparse Estimation Problems , 2011, 1108.4988.

[2]  Wenjing Yin,et al.  Robust Significance Testing in Sparse and High Dimensional Linear Models , 2015 .

[3]  Samuel Müller,et al.  Outlier Robust Model Selection in Linear Regression , 2005 .

[4]  S. MacEachern,et al.  Regularization of Case-Specific Parameters for Robustness and Efficiency , 2012, 1210.0701.

[5]  Giovanni Felici,et al.  MIP-BOOST: Efficient and Effective L 0 Feature Selection for Linear Regression , 2018, J. Comput. Graph. Stat..

[6]  Bhaskar D. Rao,et al.  Robust Linear Regression via $\ell_0$ Regularization , 2018, IEEE Transactions on Signal Processing.

[7]  V. Yohai,et al.  Robust Statistics: Theory and Methods , 2006 .

[8]  Roy E. Welsch,et al.  Algorithms for Robust Model Selection in Linear Regression , 2004 .

[9]  Hongzhe Li,et al.  Variable selection in regression with compositional covariates , 2014 .

[10]  T. Bernholt Robust Estimators are Hard to Compute , 2006 .

[11]  Bin Yu,et al.  Asymptotic Properties of Lasso+mLS and Lasso+Ridge in Sparse High-dimensional Linear Regression , 2013, 1306.5505.

[12]  Peter Filzmoser,et al.  Robust and sparse estimation methods for high-dimensional linear and logistic regression , 2017, 1703.04951.

[13]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[14]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[15]  M. Seifert,et al.  The Intervention , 2013, Non-Emerging Adulthood.

[16]  J. A. Díaz-García,et al.  SENSITIVITY ANALYSIS IN LINEAR REGRESSION , 2022 .

[17]  S. Socransky,et al.  Relation of body mass index, periodontitis and Tannerella forsythia. , 2009, Journal of clinical periodontology.

[18]  Heping Zhang,et al.  Robust Variable Selection With Exponential Squared Loss , 2013, Journal of the American Statistical Association.

[19]  R. Tibshirani,et al.  A LASSO FOR HIERARCHICAL INTERACTIONS. , 2012, Annals of statistics.

[20]  Bart P. G. Van Parys,et al.  Sparse high-dimensional regression: Exact scalable algorithms and phase transitions , 2017, The Annals of Statistics.

[21]  Ezequiel Smucler,et al.  Robust elastic net estimators for variable selection and identification of proteomic biomarkers , 2019 .

[22]  Wei Pan,et al.  On constrained and regularized high-dimensional regression , 2013, Annals of the Institute of Statistical Mathematics.

[23]  Michael Muma,et al.  The outlier-corrected-data-adaptive Lasso: A new robust estimator for the independent contamination model , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[24]  S. Weisberg,et al.  Residuals and Influence in Regression , 1982 .

[25]  Le Chang,et al.  Robust Lasso Regression Using Tukey's Biweight Criterion , 2018, Technometrics.

[26]  Ricardo A. Maronna,et al.  Robust Ridge Regression for High-Dimensional Data , 2011, Technometrics.

[27]  A. Atkinson Subset Selection in Regression , 1992 .

[28]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[29]  Gerhard-Wilhelm Weber,et al.  An approach to the mean shift outlier model by Tikhonov regularization and conic programming , 2014, Intell. Data Anal..

[30]  L. Breiman Better subset regression using the nonnegative garrote , 1995 .

[31]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[32]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[33]  Erricos John Kontoghiorghes,et al.  A graph approach to generate all possible regression submodels , 2007, Comput. Stat. Data Anal..

[34]  Andrés Gómez,et al.  Outlier Detection in Time Series via Mixed-Integer Conic Quadratic Optimization , 2021, SIAM J. Optim..

[35]  Victor J. Yohai,et al.  Robust and sparse estimators for linear regression models , 2015, Comput. Stat. Data Anal..

[36]  Wei Pan,et al.  On High-Dimensional Constrained Maximum Likelihood Inference , 2020, Journal of the American Statistical Association.

[37]  Anton Nekrutenko,et al.  Child Weight Gain Trajectories Linked To Oral Microbiota Composition , 2018, Scientific Reports.

[38]  Jennifer S Savage,et al.  Effect of the INSIGHT Responsive Parenting Intervention on Rapid Infant Weight Gain and Overweight Status at Age 1 Year: A Randomized Clinical Trial. , 2016, JAMA pediatrics.

[39]  Aditya Mishra,et al.  Robust regression with compositional covariates , 2022, Comput. Stat. Data Anal..

[40]  R. Tibshirani,et al.  Extended Comparisons of Best Subset Selection, Forward Stepwise Selection, and the Lasso , 2017, 1707.08692.

[41]  Minge Xie,et al.  SIMULTANEOUS VARIABLE SELECTION AND OUTLIER DETECTION USING LASSO WITH APPLICATIONS TO AIRCRAFT LANDING DATA ANALYSIS , 2012 .

[42]  Michael Frueh,et al.  Plots Transformations And Regression An Introduction To Graphical Methods Of Diagnostic Regression Analysis , 2016 .

[43]  Roy E. Welsch,et al.  A diagnostic method for simultaneous feature selection and outlier identification in linear regression , 2010, Comput. Stat. Data Anal..

[44]  Kateryna D Makova,et al.  The Intervention Nurses Start Infants Growing on Healthy Trajectories (INSIGHT) study , 2014, BMC Pediatrics.

[45]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[46]  Jianqing Fan,et al.  Penalized composite quasi‐likelihood for ultrahigh dimensional variable selection , 2009, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[47]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[48]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[49]  Claudio Gentile,et al.  Perspective cuts for a class of convex 0–1 mixed integer programs , 2006, Math. Program..

[50]  D. Bertsimas,et al.  Best Subset Selection via a Modern Optimization Lens , 2015, 1507.03133.

[51]  Yichao Wu,et al.  FULLY EFFICIENT ROBUST ESTIMATION, OUTLIER DETECTION AND VARIABLE SELECTION VIA PENALIZED REGRESSION , 2018 .

[52]  D. Madigan,et al.  A method for simultaneous variable selection and outlier identification in linear regression , 1996 .

[53]  Xiaotong Shen,et al.  Journal of the American Statistical Association Likelihood-based Selection and Sharp Parameter Estimation Likelihood-based Selection and Sharp Parameter Estimation , 2022 .

[54]  Ken P. Kleinman,et al.  Weight Status in the First 6 Months of Life and Obesity at 3 Years of Age , 2009, Pediatrics.

[55]  A. Hadi,et al.  Maximum trimmed likelihood estimators: a unified approach, examples, and algorithms , 1997 .

[56]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[57]  Leonidas S. Pitsoulis,et al.  Quadratic mixed integer programming and support vectors for deleting outliers in robust regression , 2009, Ann. Oper. Res..

[58]  Hui Jiang,et al.  Minimizing Sum of Truncated Convex Functions and Its Applications , 2016, Journal of Computational and Graphical Statistics.

[59]  T. Sobko,et al.  Microbiota in the Oral Subgingival Biofilm Is Associated With Obesity in Adolescence , 2012, Obesity.

[60]  Kenneth Portier,et al.  Robust Diagnostic Regression Analysis , 2002, Technometrics.

[61]  Christophe Croux,et al.  Sparse least trimmed squares regression for analyzing high-dimensional large data sets , 2013, 1304.4773.

[62]  S. Chatterjee Sensitivity analysis in linear regression , 1988 .

[63]  Wei Li,et al.  Simultaneous variable selection and outlier detection using LASSO with applications to aircraft landing data analysis , 2012 .

[64]  Shifeng Xiong,et al.  Regression with outlier shrinkage , 2013 .

[65]  H. Zou,et al.  STRONG ORACLE OPTIMALITY OF FOLDED CONCAVE PENALIZED ESTIMATION. , 2012, Annals of statistics.

[66]  Yiyuan She,et al.  Outlier Detection Using Nonconvex Penalized Regression , 2010, ArXiv.

[67]  D. Bertsimas,et al.  Least quantile regression via modern optimization , 2013, 1310.8625.

[68]  S. Halem,et al.  Is Obesity an Oral Bacterial Disease? , 2009, Journal of dental research.

[69]  Jieping Ye,et al.  Efficient nonconvex sparse group feature selection via continuous and discrete optimization , 2015, Artif. Intell..

[70]  Stephen Boyd,et al.  Minimizing a sum of clipped convex functions , 2020, Optim. Lett..

[71]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[72]  Alexander Schrijver,et al.  Theory of linear and integer programming , 1986, Wiley-Interscience series in discrete mathematics and optimization.

[73]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[74]  Po-Ling Loh,et al.  Statistical consistency and asymptotic normality for high-dimensional robust M-estimators , 2015, ArXiv.

[75]  Peter J. Rousseeuw,et al.  ROBUST REGRESSION BY MEANS OF S-ESTIMATORS , 1984 .

[76]  C Dezateux,et al.  Effects of infant feeding practice on weight gain from birth to 3 years , 2008, Archives of Disease in Childhood.

[77]  Jafar A. Khan,et al.  Robust Linear Model Selection Based on Least Angle Regression , 2007 .