Adaptive Bayesian SLOPE -- High-dimensional Model Selection with Missing Values

The selection of variables with high-dimensional and missing data is a major challenge and very few methods are available to solve this problem. Here we propose a new method -- adaptive Bayesian SLOPE -- which is an extension of the SLOPE method of sorted $l_1$ regularization within a Bayesian framework and which allows to simultaneously estimate the parameters and select variables for large data despite missing values. The method follows the idea of the Spike and Slab LASSO, but replaces the Laplace mixture prior with the frequentist motivated "SLOPE" prior, which targets control of the False Discovery Rate. The regression parameters and the noise variance are estimated using stochastic approximation EM algorithm, which allows to incorporate missing values as well as latent model parameters, like the signal magnitude and its sparsity. Extensive simulations highlight the good behavior in terms of power, FDR and estimation bias under a wide range of simulation scenarios. Finally, we consider an application of severely traumatized patients from Paris hospitals to predict the level of platelet, and demonstrate, beyond the advantage of selecting relevant variables, which is crucial for interpretation, excellent predictive capabilities. The methodology is implemented in the R package ABSLOPE, which incorporates C++ code to improve the efficiency of the proposed method.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  Olivier Ledoit,et al.  A well-conditioned estimator for large-dimensional covariance matrices , 2004 .

[3]  Dan J Stein,et al.  Global, regional, and national disability-adjusted life-years (DALYs) for 333 diseases and injuries and healthy life expectancy (HALE) for 195 countries and territories, 1990–2016: a systematic analysis for the Global Burden of Disease Study 2016 , 2017, Lancet.

[4]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[5]  Trevor Hastie,et al.  Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent. , 2011, Journal of statistical software.

[6]  Gilles Celeux,et al.  On Stochastic Versions of the EM Algorithm , 1995 .

[7]  Emmanuel J. Candès,et al.  False Discoveries Occur Early on the Lasso Path , 2015, ArXiv.

[8]  J. Josse,et al.  missMDA: A Package for Handling Missing Values in Multivariate Data Analysis , 2016 .

[9]  Wei Jiang,et al.  Logistic regression with missing covariates - Parameter estimation, model selection and prediction within a joint-modeling framework , 2018, Comput. Stat. Data Anal..

[10]  Tobias Gauss,et al.  European trauma guideline compliance assessment: the ETRAUSS study , 2015, Critical Care.

[11]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[12]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[13]  A. Sepehri The Bayesian SLOPE , 2016, 1608.08968.

[14]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[15]  J. Ibrahim,et al.  Model Selection Criteria for Missing-Data Problems Using the EM Algorithm , 2008, Journal of the American Statistical Association.

[16]  Peter Lugtig,et al.  Generating missing values for simulation purposes: a multivariate amputation procedure , 2018, Journal of Statistical Computation and Simulation.

[17]  É. Moulines,et al.  Convergence of a stochastic approximation version of the EM algorithm , 1999 .

[18]  Po-Ling Loh,et al.  High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity , 2011, NIPS.

[19]  Gerda Claeskens,et al.  Variable Selection with Incomplete Covariate Data , 2007, Biometrics.

[20]  V. Rocková,et al.  Bayesian estimation of sparse signals with a continuous spike-and-slab prior , 2018 .

[21]  Jianqing Fan,et al.  ADAPTIVE ROBUST VARIABLE SELECTION. , 2012, Annals of statistics.

[22]  Weijie J. Su,et al.  Group SLOPE – Adaptive Selection of Groups of Predictors , 2015, Journal of the American Statistical Association.

[23]  Tobias Gauss,et al.  Evaluation of the performance of French physician-staffed emergency medical service in the triage of major trauma patients , 2014, The journal of trauma and acute care surgery.

[24]  Alan D. Lopez,et al.  The Global Burden of Disease Study , 2003 .

[25]  Gaël Varoquaux,et al.  On the consistency of supervised learning with missing values , 2019, ArXiv.

[26]  E. Candès,et al.  Controlling the false discovery rate via knockoffs , 2014, 1404.5609.

[27]  A. Tsybakov,et al.  Slope meets Lasso: Improved oracle bounds and optimality , 2016, The Annals of Statistics.

[28]  E. George,et al.  The Spike-and-Slab LASSO , 2018 .

[29]  Karsten M. Borgwardt,et al.  Faculty Opinions recommendation of Panning for gold: ‘model‐X’ knockoffs for high dimensional controlled variable selection. , 2019, Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature.

[30]  Emmanuel J. Candès,et al.  SLOPE is Adaptive to Unknown Sparsity and Asymptotically Minimax , 2015, ArXiv.

[31]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[32]  Robert D. Nowak,et al.  Ordered Weighted L1 Regularized Regression with Strongly Correlated Covariates: Theoretical Aspects , 2016, AISTATS.

[33]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[34]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[35]  Yang Feng,et al.  VARIABLE SELECTION AND PREDICTION WITH INCOMPLETE HIGH-DIMENSIONAL DATA. , 2016, The annals of applied statistics.

[36]  L. Aarons,et al.  Mixed Effects Models for the Population Approach: Models, Tasks, Methods, and Tools , 2015, CPT: Pharmacometrics & Systems Pharmacology.

[37]  Y. Ning,et al.  Penalized pairwise pseudo likelihood for variable selection with nonignorable missing data , 2017, Statistica Sinica.

[38]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[39]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[40]  On the sign recovery given by the thresholded LASSO and thresholded Basis Pursuit , 2018 .

[41]  Weijie J. Su,et al.  SLOPE-ADAPTIVE VARIABLE SELECTION VIA CONVEX OPTIMIZATION. , 2014, The annals of applied statistics.

[42]  Trevor Hastie,et al.  Regularized linear discriminant analysis and its application in microarrays. , 2007, Biostatistics.

[43]  D. Rubin INFERENCE AND MISSING DATA , 1975 .