Hybrid L1/2  + 2 method for gene selection in the Cox proportional hazards model.

BACKGROUND AND OBJECTIVE An important issue in genomic research is to identify the significant genes that related to survival from tens of thousands of genes. Although Cox proportional hazards model is a conventional survival analysis method, it does not induce the gene selection. METHODS In this paper, we extend the hybrid L1/2  + 2 regularization (HLR) idea to the censored survival situation, a new edition of sparse Cox model based on the HLR method has been proposed. We develop two algorithms for solving the HLR penalized Cox model; one is the coordinate descent algorithm with HLR thresholding operator, the other is the weight iteration method. RESULTS The proposed method was tested on six public mRNA data sets of serval kinds of cancers, AML, Breast cancer, Pancreatic cancer, DLBCL and Melanoma. The test results indicate that the method identified a small subset of genes but essential while giving best or equivalent predictive performance, as compared to some popular methods. CONCLUSIONS The results of empirical and simulations imply that the proposed strategy is highly competitive in studying high dimensional survival data among several state-of-the-art methods.

[1]  L. Staudt,et al.  The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. , 2002, The New England journal of medicine.

[2]  Johan Staaf,et al.  Molecular stratification of metastatic melanoma using gene expression profiling : Prediction of survival outcome and benefit from molecular targeted therapy , 2015, Oncotarget.

[3]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[4]  Hamza Lasla,et al.  Gene-expression molecular subtyping of triple-negative breast cancer tumours: importance of immune response , 2015, Breast Cancer Research.

[5]  Hao Helen Zhang,et al.  Adaptive Lasso for Cox's proportional hazards model , 2007 .

[6]  OC Lingjærde,et al.  Predicting survival from gene expression data by generalized partial least squares regression , 2005, Breast Cancer Research.

[7]  L. V. van't Veer,et al.  Cross‐validated Cox regression on microarray gene expression data , 2006, Statistics in medicine.

[8]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[9]  Kwong-Sak Leung,et al.  Sparse logistic regression with a L1/2 penalty for gene selection in cancer classification , 2013, BMC Bioinformatics.

[10]  James Ferrara,et al.  Toward biomarkers for chronic graft-versus-host disease: National Institutes of Health consensus development project on criteria for clinical trials in chronic graft-versus-host disease: III. Biomarker Working Group Report. , 2006, Biology of blood and marrow transplantation : journal of the American Society for Blood and Marrow Transplantation.

[11]  R. Tibshirani,et al.  Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia. , 2004, The New England journal of medicine.

[12]  Wang Yao,et al.  L 1/2 regularization , 2010 .

[13]  Yichao Wu ELASTIC NET FOR COX'S PROPORTIONAL HAZARDS MODEL WITH A SOLUTION PATH ALGORITHM. , 2012, Statistica Sinica.

[14]  Axel Benner,et al.  Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data , 2011, BMC Bioinformatics.

[15]  Frank E. Harrell,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[16]  Kwong-Sak Leung,et al.  The L1/2 regularization method for variable selection in the Cox model , 2014, Appl. Soft Comput..

[17]  Zongben Xu,et al.  $L_{1/2}$ Regularization: A Thresholding Representation Theory and a Fast Solver , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Dmitry M. Malioutov,et al.  A sparse signal reconstruction perspective for source localization with sensor arrays , 2005, IEEE Transactions on Signal Processing.

[19]  Johan Staaf,et al.  Relation between smoking history and gene expression profiles in lung adenocarcinomas , 2012, BMC Medical Genomics.

[20]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[21]  Shiaw-Yih Lin,et al.  Mcph1/Brit1 deficiency promotes genomic instability and tumor formation in a mouse model , 2014, Oncogene.

[22]  Yong Liang,et al.  Identification of 13 blood-based gene expression signatures to accurately distinguish tuberculosis from other pulmonary diseases and healthy controls. , 2015, Bio-medical materials and engineering.

[23]  Jianqing Fan,et al.  Variable Selection for Cox's proportional Hazards Model and Frailty Model , 2002 .

[24]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[25]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[26]  Dung-Tsa Chen,et al.  Prognostic Fifteen-Gene Signature for Early Stage Pancreatic Ductal Adenocarcinoma , 2015, PloS one.

[27]  Jiang Gui,et al.  Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data , 2005, Bioinform..

[28]  Yaguang Xi,et al.  Large isoform of MRJ (DNAJB6) reduces malignant activity of breast cancer , 2008, Breast Cancer Research.

[29]  Ralf Bender,et al.  Generating survival times to simulate Cox proportional hazards models , 2005, Statistics in medicine.

[30]  Michelle E. Melisko A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer , 2005 .

[31]  Genevera I. Allen,et al.  Molecular pathway identification using biological network-regularized logistic models , 2013, BMC Genomics.

[32]  Xiao-Ying Liu,et al.  Feature Selection and Cancer Classification via Sparse Logistic Regression with the Hybrid L1/2 +2 Regularization , 2016, PloS one.

[33]  Xiao-Ying Liu,et al.  Network-Based Logistic Classification with an Enhanced L 1/2 Solver Reveals Biomarker and Subnetwork Signatures for Diagnosing Lung Cancer , 2015, BioMed research international.

[34]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[35]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[36]  Alicja R. Rudnicka,et al.  Measures to assess the prognostic ability of the stratified Cox proportional hazards model , 2009, Statistics in medicine.