Identifying Important Predictors in Large Data Bases − Multiple Testing and Model Selection

This is a chapter of the forthcoming Handbook of Multiple Testing. We consider a variety of model selection strategies in a high-dimensional setting, where the number of potential predictors p is large compared to the number of available observations n. In particular modifications of information criteria which are suitable in case of p > n are introduced and compared with a variety of penalized likelihood methods, in particular SLOPE and SLOBE. The focus is on methods which control the FDR in terms of model identification. Theoretical results are provided both with respect to model identification and prediction and various simulation results are presented which illustrate the performance of the different methods in different situations.

[1]  M. Bogdan,et al.  On the sign recovery by least absolute shrinkage and selection operator, thresholded least absolute shrinkage and selection operator, and thresholded basis pursuit denoising , 2018, Scandinavian Journal of Statistics.

[2]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[3]  J WainwrightMartin Sharp thresholds for high-dimensional and noisy sparsity recovery using l1-constrained quadratic programming (Lasso) , 2009 .

[4]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[5]  Jianqing Fan,et al.  Penalized composite quasi‐likelihood for ultrahigh dimensional variable selection , 2009, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[6]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[7]  Mário A. T. Figueiredo,et al.  Decreasing Weighted Sorted ℓ1 Regularization , 2014, ArXiv.

[8]  I. Johnstone,et al.  Adapting to unknown sparsity by controlling the false discovery rate , 2005, math/0505374.

[9]  Malgorzata Bogdan,et al.  On the Asymptotic Properties of SLOPE , 2019, Sankhya A.

[10]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[11]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[12]  Yiyuan She,et al.  Outlier Detection Using Nonconvex Penalized Regression , 2010, ArXiv.

[13]  Lucas Janson,et al.  Panning for gold: ‘model‐X’ knockoffs for high dimensional controlled variable selection , 2016, 1610.02351.

[14]  Weijie J. Su,et al.  Group SLOPE – Adaptive Selection of Groups of Predictors , 2015, Journal of the American Statistical Association.

[15]  David R. Anderson,et al.  Model Selection and Multimodel Inference , 2003 .

[16]  Malgorzata Bogdan,et al.  Asymptotic Bayes optimality under sparsity for generally distributed effect sizes under the alternative , 2010, 1005.4753.

[17]  E. Candès,et al.  Controlling the false discovery rate via knockoffs , 2014, 1404.5609.

[18]  A. Tsybakov,et al.  Slope meets Lasso: Improved oracle bounds and optimality , 2016, The Annals of Statistics.

[19]  Patrick Tardivel,et al.  The Geometry of Uniqueness and Model Selection of Penalized Estimators including SLOPE, LASSO, and Basis Pursuit , 2020 .

[20]  Andrea Montanari,et al.  The LASSO Risk for Gaussian Matrices , 2010, IEEE Transactions on Information Theory.

[21]  H. Bondell,et al.  Simultaneous Regression Shrinkage, Variable Selection, and Supervised Clustering of Predictors with OSCAR , 2008, Biometrics.

[22]  G. Kitagawa,et al.  Generalised information criteria in model selection , 1996 .

[23]  Harrison H. Zhou,et al.  Model selection and sharp asymptotic minimaxity , 2013 .

[24]  Hirokazu Yanagihara,et al.  Bias correction of cross-validation criterion based on Kullback-Leibler information under a general condition , 2006 .

[25]  Heng Huang,et al.  Fast OSCAR and OWL Regression via Safe Screening Rules , 2020, ICML.

[26]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[27]  E. George,et al.  The Spike-and-Slab LASSO , 2018 .

[28]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[29]  Runze Li,et al.  Quantile Regression for Analyzing Heterogeneity in Ultra-High Dimension , 2012, Journal of the American Statistical Association.

[30]  Malgorzata Bogdan,et al.  Rank-based Lasso -- efficient methods for high-dimensional robust model selection , 2019 .

[31]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[32]  Malgorzata Bogdan,et al.  QTL Mapping Using a Memetic Algorithm with Modifications of BIC as Fitness Function , 2012, Statistical applications in genetics and molecular biology.

[33]  Dean P. Foster,et al.  The risk inflation criterion for multiple regression , 1994 .

[34]  F. Santosa,et al.  Linear inversion of ban limit reflection seismograms , 1986 .

[35]  Cun-Hui Zhang,et al.  Rate Minimaxity of the Lasso and Dantzig Selector for the lq Loss in lr Balls , 2010, J. Mach. Learn. Res..

[36]  Mário A. T. Figueiredo,et al.  Decreasing Weighted Sorted ${\ell_1}$ Regularization , 2014, IEEE Signal Processing Letters.

[37]  Sandra Paterlini,et al.  Sparse Index Clones via the Sorted L1-Norm , 2020, SSRN Electronic Journal.

[38]  Felix Abramovich,et al.  High-Dimensional Classification by Sparse Logistic Regression , 2017, IEEE Transactions on Information Theory.

[39]  Weijie J. Su,et al.  SLOPE-ADAPTIVE VARIABLE SELECTION VIA CONVEX OPTIMIZATION. , 2014, The annals of applied statistics.

[40]  P. Burman A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods , 1989 .

[41]  Irène Gannaz,et al.  Robust estimation and wavelet thresholding in partially linear models , 2007, Stat. Comput..

[42]  Emmanuel J. Candès,et al.  SLOPE is Adaptive to Unknown Sparsity and Asymptotically Minimax , 2015, ArXiv.

[43]  Emmanuel J. Candès,et al.  Highly Robust Error Correction byConvex Programming , 2006, IEEE Transactions on Information Theory.

[44]  Tadayoshi Fushiki,et al.  Estimation of prediction error by using K-fold cross-validation , 2011, Stat. Comput..

[45]  Jianqing Fan,et al.  ADAPTIVE ROBUST VARIABLE SELECTION. , 2012, Annals of statistics.

[46]  S. Geer HIGH-DIMENSIONAL GENERALIZED LINEAR MODELS AND THE LASSO , 2008, 0804.0703.

[47]  Jiahua Chen,et al.  Extended Bayesian information criteria for model selection with large model spaces , 2008 .

[48]  Malgorzata Bogdan,et al.  Modified versions of Bayesian Information Criterion for genome-wide association studies , 2012, Comput. Stat. Data Anal..

[49]  Johan Larsson,et al.  The Strong Screening Rule for SLOPE , 2020, NeurIPS.

[50]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[51]  Jan Mielniczuk,et al.  Combined l1 and greedy l0 penalized least squares for linear model selection , 2013, J. Mach. Learn. Res..

[52]  Irène Gannaz Robust Estimation and Wavelet Thresholding in Partial Linear Models , 2006, math/0612066.

[53]  R. Redon,et al.  Relative Impact of Nucleotide and Copy Number Variation on Gene Expression Phenotypes , 2007, Science.

[54]  Malgorzata Bogdan,et al.  Modified versions of the Bayesian Information Criterion for sparse Generalized Linear Models , 2011, Comput. Stat. Data Anal..

[55]  H. Akaike A new look at the statistical model identification , 1974 .

[56]  Christian Gieger,et al.  Bayesian and frequentist analysis of an Austrian genome-wide association study of colorectal cancer and advanced adenomas , 2017, Oncotarget.

[57]  Florian Frommlet,et al.  Analyzing Genome-Wide Association Studies with an FDR Controlling Modification of the Bayesian Information Criterion , 2014, PloS one.

[58]  Grégory Nuel,et al.  An Adaptive Ridge Procedure for L0 Regularization , 2015, PloS one.

[59]  D. Bertsimas,et al.  Best Subset Selection via a Modern Optimization Lens , 2015, 1507.03133.

[60]  J. Ghosh,et al.  Modifying the Schwarz Bayesian Information Criterion to Locate Multiple Interacting Quantitative Trait Loci , 2004, Genetics.

[61]  Florian Frommlet,et al.  Modifications of BIC for data mining under sparsity , 2011, OR.

[62]  Asaf Weinstein,et al.  A Power Analysis for Knockoffs with the Lasso Coefficient-Difference Statistic , 2020 .

[63]  D. Donoho,et al.  Basis pursuit , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[64]  Asaf Weinstein,et al.  A Power and Prediction Analysis for Knockoffs with Lasso Statistics , 2017, 1712.06465.

[65]  Wei Jiang,et al.  Adaptive Bayesian SLOPE -- High-dimensional Model Selection with Missing Values , 2019 .

[66]  Christine B. Peterson,et al.  Controlling the Rate of GWAS False Discoveries , 2016, Genetics.

[67]  Karl W. Broman,et al.  A model selection approach for the identification of quantitative trait loci in experimental crosses , 2002 .

[68]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[69]  Agathe Guilloux,et al.  HIGH-DIMENSIONAL ROBUST REGRESSION AND OUTLIERS DETECTION WITH SLOPE , 2017, 1712.02640.

[70]  Étienne Roquain,et al.  On false discovery rate thresholding for classification under sparsity , 2011, 1106.6147.

[71]  Malgorzata Bogdan,et al.  Some optimality properties of FDR controlling rules under sparsity , 2013 .

[72]  Laura Buzdugan,et al.  Hierarchical inference for genome-wide association studies: a view on methodology with software , 2018, Computational Statistics.

[73]  M. Bogdan,et al.  Joint Genotype- and Ancestry-based Genome-wide Association Studies in Admixed Populations , 2016, bioRxiv.

[74]  Roy E. Welsch,et al.  Robust variable selection using least angle regression and elemental set sampling , 2007, Comput. Stat. Data Anal..

[75]  P. Massart,et al.  Gaussian model selection , 2001 .

[76]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[77]  Malgorzata Bogdan,et al.  Phenotypes and Genotypes: The Search for Influential Genes , 2016 .

[78]  Isabelle Guyon,et al.  Design of experiments for the NIPS 2003 variable selection benchmark , 2003 .

[79]  Emmanuel J. Candès,et al.  False Discoveries Occur Early on the Lasso Path , 2015, ArXiv.