NETWORK-REGULARIZED HIGH-DIMENSIONAL COX REGRESSION FOR ANALYSIS OF GENOMIC DATA.

We consider estimation and variable selection in high-dimensional Cox regression when a prior knowledge of the relationships among the covariates, described by a network or graph, is available. A limitation of the existing methodology for survival analysis with high-dimensional genomic data is that a wealth of structural information about many biological processes, such as regulatory networks and pathways, has often been ignored. In order to incorporate such prior network information into the analysis of genomic data, we propose a network-based regularization method for high-dimensional Cox regression; it uses an ℓ1-penalty to induce sparsity of the regression coefficients and a quadratic Laplacian penalty to encourage smoothness between the coefficients of neighboring variables on a given network. The proposed method is implemented by an efficient coordinate descent algorithm. In the setting where the dimensionality p can grow exponentially fast with the sample size n, we establish model selection consistency and estimation bounds for the proposed estimators. The theoretical results provide insights into the gain from taking into account the network structural information. Extensive simulation studies indicate that our method outperforms Lasso and elastic net in terms of variable selection accuracy and stability. We apply our method to a breast cancer gene expression study and identify several biologically plausible subnetworks and pathways that are associated with breast cancer distant metastasis.

[1]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[2]  Hao Helen Zhang,et al.  Adaptive Lasso for Cox's proportional hazards model , 2007 .

[3]  Jiang Gui,et al.  Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data , 2005, Bioinform..

[4]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[5]  Edward J Oakeley,et al.  WNT signaling enhances breast cancer cell motility and blockade of the WNT pathway by sFRP1 suppresses MDA-MB-231 xenograft growth , 2009, Breast Cancer Research.

[6]  Trevor Hastie,et al.  Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent. , 2011, Journal of statistical software.

[7]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[8]  A. Fraser,et al.  Systematic mapping of genetic interactions in Caenorhabditis elegans identifies common modifiers of diverse signaling pathways , 2006, Nature Genetics.

[9]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[10]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[11]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[12]  A. Ben-Baruch,et al.  The inflammatory chemokines CCL2 and CCL5 in breast cancer. , 2008, Cancer letters.

[13]  Jinchi Lv,et al.  A unified approach to model selection and sparse recovery using regularized least squares , 2009, 0905.3573.

[14]  R. Gill,et al.  Cox's regression model for counting processes: a large sample study : (preprint) , 1982 .

[15]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[16]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[17]  Bin Nan,et al.  Hierarchically penalized Cox regression with grouped variables , 2009 .

[18]  Jinchi Lv,et al.  High-Dimensional Sparse Additive Hazards Regression , 2012, 1212.6232.

[19]  Federico Garrido,et al.  The selection of tumor variants with altered expression of classical and nonclassical MHC class I molecules: implications for tumor immune escape , 2004, Cancer Immunology, Immunotherapy.

[20]  J. Foekens,et al.  Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer , 2005, The Lancet.

[21]  Larry Norton,et al.  Latent bone metastasis in breast cancer tied to Src-dependent survival signals. , 2009, Cancer cell.

[22]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[23]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[24]  Hongzhe Li,et al.  VARIABLE SELECTION AND REGRESSION ANALYSIS FOR GRAPH-STRUCTURED COVARIATES WITH AN APPLICATION TO GENOMICS. , 2010, The annals of applied statistics.

[25]  O. Rath,et al.  MAP kinase signalling pathways in cancer , 2007, Oncogene.

[26]  H. Putter,et al.  HLA-E and HLA-G Expression in Classical HLA Class I-Negative Tumors Is of Prognostic Value for Clinical Outcome of Early Breast Cancer Patients , 2010, The Journal of Immunology.

[27]  David R. Cox,et al.  Regression models and life tables (with discussion , 1972 .

[28]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[29]  Robert A. Weinberg,et al.  A Pleiotropically Acting MicroRNA, miR-31, Inhibits Breast Cancer Metastasis , 2009 .

[30]  Yichao Wu ELASTIC NET FOR COX'S PROPORTIONAL HAZARDS MODEL WITH A SOLUTION PATH ALGORITHM. , 2012, Statistica Sinica.

[31]  M. Kosorok Introduction to Empirical Processes and Semiparametric Inference , 2008 .

[32]  Cun-Hui Zhang,et al.  ORACLE INEQUALITIES FOR THE LASSO IN THE COX MODEL. , 2013, Annals of statistics.

[33]  Jian Huang,et al.  The Sparse Laplacian Shrinkage Estimator for High-Dimensional Regression. , 2011, Annals of statistics.

[34]  Mohamed Hebiri,et al.  The Smooth-Lasso and other $\ell_1+\ell_2$-penalized methods , 2010, 1003.4885.

[35]  P. Massart,et al.  Concentration inequalities and model selection , 2007 .

[36]  Jianqing Fan,et al.  REGULARIZATION FOR COX'S PROPORTIONAL HAZARDS MODEL WITH NP-DIMENSIONALITY. , 2010, Annals of statistics.

[37]  Wei Pan,et al.  Incorporating Predictor Network in Penalized Regression with Application to Microarray Data , 2010, Biometrics.

[38]  S. Kong,et al.  Non-Asymptotic Oracle Inequalities for the High-Dimensional Cox Regression via Lasso. , 2012, Statistica Sinica.

[39]  Anestis Antoniadis,et al.  The Dantzig Selector in Cox's Proportional Hazards Model , 2009 .

[40]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[41]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[42]  Yi Li,et al.  Statistical Applications in Genetics and Molecular Biology Survival Analysis with High-Dimensional Covariates : An Application in Microarray Studies , 2011 .

[43]  S. Geer,et al.  The Smooth-Lasso and other ℓ1+ℓ2-penalized methods , 2011 .

[44]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[45]  K. Lange,et al.  Coordinate descent algorithms for lasso penalized regression , 2008, 0803.3876.

[46]  Kun-Liang Guan,et al.  Isolation and Characterization of a Novel Dual Specific Phosphatase, HVH2, Which Selectively Dephosphorylates the Mitogen-activated Protein Kinase (*) , 1995, The Journal of Biological Chemistry.

[47]  Jianqing Fan,et al.  Variable Selection for Cox's proportional Hazards Model and Frailty Model , 2002 .