A selective review of robust variable selection with applications in bioinformatics

A drastic amount of data have been and are being generated in bioinformatics studies. In the analysis of such data, the standard modeling approaches can be challenged by the heavy-tailed errors and outliers in response variables, the contamination in predictors (which may be caused by, for instance, technical problems in microarray gene expression studies), model mis-specification and others. Robust methods are needed to tackle these challenges. When there are a large number of predictors, variable selection can be as important as estimation. As a generic variable selection and regularization tool, penalization has been extensively adopted. In this article, we provide a selective review of robust penalized variable selection approaches especially designed for high-dimensional data from bioinformatics and biomedical studies. We discuss the robust loss functions, penalty functions and computational algorithms. The theoretical properties and implementation are also briefly examined. Application examples of the robust penalization approaches in representative bioinformatics and biomedical studies are also illustrated.

[1]  Xingjie Shi,et al.  A penalized robust semiparametric approach for gene–environment interactions , 2015, Statistics in medicine.

[2]  Lie Wang The L1L1 penalized LAD estimator for high dimensional linear regression , 2013, J. Multivar. Anal..

[3]  Laurent Zwald,et al.  Robust regression through the Huber’s criterion and adaptive lasso penalty , 2011 .

[4]  Yufeng Liu,et al.  VARIABLE SELECTION IN QUANTILE REGRESSION , 2009 .

[5]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[6]  R. Tibshirani,et al.  Spatial smoothing and hot spot detection for CGH data using the fused lasso. , 2008, Biostatistics.

[7]  R. Koenker Quantile Regression: Name Index , 2005 .

[8]  Kam D. Dahlquist,et al.  Regression Approaches for Microarray Data Analysis , 2002, J. Comput. Biol..

[9]  Runze Li,et al.  Quantile Regression for Analyzing Heterogeneity in Ultra-High Dimension , 2012, Journal of the American Statistical Association.

[10]  Huixia Judy Wang,et al.  Interquantile shrinkage and variable selection in quantile regression , 2014, Comput. Stat. Data Anal..

[11]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[12]  V. Sheffield,et al.  Regulation of gene expression in the mammalian eye and its relevance to eye disease , 2006, Proceedings of the National Academy of Sciences.

[13]  Hui Zou,et al.  Computational Statistics and Data Analysis Regularized Simultaneous Model Selection in Multiple Quantiles Regression , 2022 .

[14]  Huixia Judy Wang,et al.  VARIABLE SELECTION FOR CENSORED QUANTILE REGRESION. , 2013, Statistica Sinica.

[15]  Brent A. Johnson,et al.  Rank-based estimation in the {ell}1-regularized partly linear model for censored outcomes with application to integrated analyses of clinical predictors and gene expression data. , 2009, Biostatistics.

[16]  Limin Peng,et al.  Rank-based variable selection , 2008 .

[17]  Jiahua Chen,et al.  Variable Selection in Finite Mixture of Regression Models , 2007 .

[18]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[19]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[20]  Chenlei Leng,et al.  Rank-based variable selection with censored data , 2010, Stat. Comput..

[21]  A. Belloni,et al.  L1-Penalized Quantile Regression in High Dimensional Sparse Models , 2009, 0904.2931.

[22]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[23]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[24]  Louis A. Jaeckel Estimating Regression Coefficients by Minimizing the Dispersion of the Residuals , 1972 .

[25]  Jianqing Fan,et al.  Penalized composite quasi‐likelihood for ultrahigh dimensional variable selection , 2009, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[26]  Runze Li,et al.  Weighted Wilcoxon‐Type Smoothly Clipped Absolute Deviation Method , 2009, Biometrics.

[27]  Jian Huang,et al.  A robust penalized method for the analysis of noisy DNA copy number data , 2010, BMC Genomics.

[28]  Ji Zhu,et al.  L1-Norm Quantile Regression , 2008 .

[29]  T Cai,et al.  Regularized Estimation for the Accelerated Failure Time Model , 2009, Biometrics.

[30]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[31]  Peter J. Huber,et al.  Robust Statistics , 2005, Wiley Series in Probability and Statistics.

[32]  K. Vranizan,et al.  Conditional expression of a Gi-coupled receptor causes ventricular conduction delay and a lethal cardiomyopathy. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[33]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[34]  Xiaoli Gao,et al.  Asymptotic analysis of high-dimensional LAD regression with Lasso , 2016 .

[35]  S. Geer,et al.  ℓ1-penalization for mixture regression models , 2010, 1202.6046.

[36]  Yong Zhou,et al.  A Penalized Robust Method for Identifying Gene–Environment Interactions , 2014, Genetic epidemiology.

[37]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[38]  R. O’Hara,et al.  A review of Bayesian variable selection methods: what, how and which , 2009 .

[39]  Heping Zhang,et al.  Robust Variable Selection With Exponential Squared Loss , 2013, Journal of the American Statistical Association.

[40]  Jianqing Fan,et al.  High Dimensional Classification Using Features Annealed Independence Rules. , 2007, Annals of statistics.

[41]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[42]  H. Zou,et al.  Composite quantile regression and the oracle Model Selection Theory , 2008, 0806.2905.

[43]  Y. Goldberg,et al.  On the robustness of the adaptive lasso to model misspecification. , 2012, Biometrika.

[44]  Qi Long,et al.  A tutorial on rank-based coefficient estimation for censored data in small- and large-scale problems , 2013, Stat. Comput..

[45]  Qin Wang,et al.  Robust variable selection through MAVE , 2013, Comput. Stat. Data Anal..

[46]  Jianqing Fan,et al.  ADAPTIVE ROBUST VARIABLE SELECTION. , 2012, Annals of statistics.

[47]  Jianqing Fan,et al.  A Selective Overview of Variable Selection in High Dimensional Feature Space. , 2009, Statistica Sinica.

[48]  Hansheng Wang,et al.  Robust Regression Shrinkage and Consistent Variable Selection Through the LAD-Lasso , 2007 .

[49]  Bo Peng,et al.  An Iterative Coordinate Descent Algorithm for High-Dimensional Nonconvex Penalized Quantile Regression , 2015 .