Regularized logistic regression with adjusted adaptive elastic net for gene selection in high dimensional cancer classification

Cancer classification and gene selection in high-dimensional data have been popular research topics in genetics and molecular biology. Recently, adaptive regularized logistic regression using the elastic net regularization, which is called the adaptive elastic net, has been successfully applied in high-dimensional cancer classification to tackle both estimating the gene coefficients and performing gene selection simultaneously. The adaptive elastic net originally used elastic net estimates as the initial weight, however, using this weight may not be preferable for certain reasons: First, the elastic net estimator is biased in selecting genes. Second, it does not perform well when the pairwise correlations between variables are not high. Adjusted adaptive regularized logistic regression (AAElastic) is proposed to address these issues and encourage grouping effects simultaneously. The real data results indicate that AAElastic is significantly consistent in selecting genes compared to the other three competitor regularization methods. Additionally, the classification performance of AAElastic is comparable to the adaptive elastic net and better than other regularization methods. Thus, we can conclude that AAElastic is a reliable adaptive regularized logistic regression method in the field of high-dimensional cancer classification.

[1]  Chong Wang,et al.  Statistical Applications in Genetics and Molecular Biology A Comparison of Multifactor Dimensionality Reduction and L1-Penalized Regression to Identify Gene-Gene Interactions in Genetic , 2011 .

[2]  Jinshan Liu,et al.  Optimal gene subset selection using the modified SFFS algorithm for tumor classification , 2012, Neural Computing and Applications.

[3]  Yanwen Chong,et al.  Gene selection using independent variable group analysis for tumor classification , 2011, Neural Computing and Applications.

[4]  Jian Huang,et al.  Penalized feature selection and classification in bioinformatics , 2008, Briefings Bioinform..

[5]  Hao Helen Zhang,et al.  ON THE ADAPTIVE ELASTIC-NET WITH A DIVERGING NUMBER OF PARAMETERS. , 2009, Annals of statistics.

[6]  S. Geer,et al.  Correlated variables in regression: Clustering and sparse estimation , 2012, 1209.5908.

[7]  Ehsan Lotfi,et al.  Gene expression microarray classification using PCA-BEL , 2014, Comput. Biol. Medicine.

[8]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[9]  Erika Cule,et al.  Ridge Regression in Prediction Problems: Automatic Choice of the Ridge Parameter , 2013, Genetic epidemiology.

[10]  Jian Yang,et al.  Sparse maximum margin discriminant analysis for feature extraction and gene selection on gene expression data , 2013, Comput. Biol. Medicine.

[11]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[12]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[13]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[14]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[15]  Jianzhong Li,et al.  A stable gene selection in microarray data analysis , 2006, BMC Bioinformatics.

[16]  Alison A. Motsinger-Reif,et al.  Statistical Applications in Genetics and Molecular Biology A Comparison of Multifactor Dimensionality Reduction and L1-Penalized Regression to Identify Gene-Gene Interactions in Genetic , 2011 .

[17]  Borut Peterlin,et al.  Rasch-based high-dimensionality data reduction and class prediction with applications to microarray gene expression data , 2010, Expert Syst. Appl..

[18]  S. Sathiya Keerthi,et al.  A simple and efficient algorithm for gene selection using sparse logistic regression , 2003, Bioinform..

[19]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Dries F. Benoit,et al.  Bayesian adaptive Lasso quantile regression , 2012 .

[21]  Gavin C. Cawley,et al.  Gene Selection in Cancer Classification using Sparse Logistic Regression with Bayesian Regularisation , 2006 .

[22]  Jianqing Fan,et al.  ADAPTIVE ROBUST VARIABLE SELECTION. , 2012, Annals of statistics.

[23]  Yixin Chen,et al.  Biomarker discovery using 1-norm regularization for multiclass earthworm microarray gene expression data , 2012, Neurocomputing.

[24]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[25]  Weixiang Liu,et al.  An experimental comparison of gene selection by Lasso and Dantzig selector for cancer classification , 2011, Comput. Biol. Medicine.

[26]  Juntao Li,et al.  An Improved Elastic Net for Cancer Classification and Gene Selection: An Improved Elastic Net for Cancer Classification and Gene Selection , 2010 .

[27]  Li Shen,et al.  Dimension reduction-based penalized logistic regression for cancer classification using microarray data , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[28]  T. Hastie,et al.  Classification of gene microarrays by penalized logistic regression. , 2004, Biostatistics.

[29]  Mee Young Park,et al.  Penalized logistic regression for detecting gene interactions. , 2008, Biostatistics.

[31]  Xi Chen,et al.  Statistical Applications in Genetics and Molecular Biology Adaptive Elastic-Net Sparse Principal Component Analysis for Pathway Association Testing , 2012 .

[32]  Yingmin Jia,et al.  Partly adaptive elastic net and its application to microarray classification , 2012, Neural Computing and Applications.

[33]  Xiaojian Yang,et al.  The LASSO and Sparse Least Squares Regression Methods for SNP Selection in Predicting Quantitative Traits , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[34]  Yue Han,et al.  Stable Gene Selection from Microarray Data via Sample Weighting , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[35]  ShenLi,et al.  Dimension Reduction-Based Penalized Logistic Regression for Cancer Classification Using Microarray Data , 2005 .

[36]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[37]  B. Chandra,et al.  An efficient statistical feature selection approach for classification of gene expression data , 2011, J. Biomed. Informatics.

[38]  Concha Bielza,et al.  Regularized logistic regression without a penalty term: An application to cancer classification with microarray data , 2011, Expert Syst. Appl..

[39]  Mohammed El Anbari,et al.  Penalized regression combining the L1 norm and a correlation based penalty , 2013, Sankhya B.

[40]  Svetha Venkatesh,et al.  Stable feature selection for clinical prediction: Exploiting ICD tree structure using Tree-Lasso , 2015, J. Biomed. Informatics.

[41]  Samiran Ghosh,et al.  On the grouped selection and model complexity of the adaptive elastic net , 2011, Stat. Comput..

[42]  Sijian Wang,et al.  RANDOM LASSO. , 2011, The annals of applied statistics.

[43]  Muhammad Hisyam Lee,et al.  Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification , 2015, Expert Syst. Appl..

[44]  Jan Kalina,et al.  Classification methods for high-dimensional genetic data , 2014 .

[45]  Ying-Min Jia,et al.  An Improved Elastic Net for Cancer Classification and Gene Selection , 2010 .

[46]  G. Tian,et al.  Statistical Applications in Genetics and Molecular Biology Sparse Logistic Regression with Lp Penalty for Biomarker Identification , 2011 .

[47]  Minrui Fei,et al.  A novel forward gene selection algorithm for microarray data , 2014, Neurocomputing.

[48]  Madhubanti Maitra,et al.  Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique , 2015, Expert Syst. Appl..

[49]  Kwong-Sak Leung,et al.  Sparse logistic regression with a L1/2 penalty for gene selection in cancer classification , 2013, BMC Bioinformatics.