Including network knowledge into Cox regression models for biomarker signature discovery

Discovery of prognostic and diagnostic biomarker gene signatures for diseases, such as cancer, is seen as a major step toward a better personalized medicine. During the last decade various methods have been proposed for that purpose. However, one important obstacle for making gene signatures a standard tool in clinical diagnosis is the typical low reproducibility of these signatures combined with the difficulty to achieve a clear biological interpretation. For that purpose in the last years there has been a growing interest in approaches that try to integrate information from molecular interaction networks. Most of these methods focus on classification problems, that is learn a model from data that discriminates patients into distinct clinical groups. Far less has been published on approaches that predict a patient's event risk. In this paper, we investigate eight methods that integrate network information into multivariable Cox proportional hazard models for risk prediction in breast cancer. We compare the prediction performance of our tested algorithms via cross-validation as well as across different datasets. In addition, we highlight the stability and interpretability of obtained gene signatures. In conclusion, we find GeneRank-based filtering to be a simple, computationally cheap and highly predictive technique to integrate network information into event time prediction models. Signatures derived via this method are highly reproducible.

[1]  Igor Jurisica,et al.  Inferring the functions of longevity genes with modular subnetwork biomarkers of Caenorhabditis elegans aging , 2010, Genome Biology.

[2]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[3]  Alex Arenas,et al.  Improved prognostic classification of breast cancer defined by antagonistic activation patterns of immune response pathway modules , 2010, BMC Cancer.

[4]  Georg Heinze,et al.  Gene selection in microarray survival studies under possibly non-proportional hazards , 2010, Bioinform..

[5]  Xiaodong Lin,et al.  Gene expression Gene selection using support vector machines with non-convex penalty , 2005 .

[6]  Yi Zhang,et al.  Pathway analysis of gene signatures predicting metastasis of node-negative primary breast cancer , 2007, BMC Cancer.

[7]  Philippe Lambert Modelling of non-linear growth curves on series of correlated count data measured at unequally spaced times: a full likelihood based approach , 1996 .

[8]  Dragomir R. Radev,et al.  Identifying gene-disease associations using centrality on a literature mined gene-interaction network , 2008, ISMB.

[9]  D. Cox Regression Models and Life-Tables , 1972 .

[10]  Holger Fröhlich,et al.  Network and Data Integration for Biomarker Signature Discovery via Network Smoothed T-Statistics , 2013, PloS one.

[11]  L. Holmberg,et al.  Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts , 2005, Breast Cancer Research.

[12]  M. Schumacher,et al.  Consistent Estimation of the Expected Brier Score in General Survival Models with Right‐Censored Event Times , 2006, Biometrical journal. Biometrische Zeitschrift.

[13]  Stefan Wiemann,et al.  KEGGgraph: a graph approach to KEGG PATHWAY in R and bioconductor , 2009, Bioinform..

[14]  David Warde-Farley,et al.  Dynamic modularity in protein interaction networks predicts breast cancer outcome , 2009, Nature Biotechnology.

[15]  Holger Fröhlich,et al.  Prognostic gene signatures for patient stratification in breast cancer - accuracy, stability and interpretability of gene selection approaches using prior knowledge on protein-protein interactions , 2012, BMC Bioinformatics.

[16]  J. Bergh,et al.  Strong Time Dependence of the 76-Gene Prognostic Signature for Node-Negative Breast Cancer Patients in the TRANSBIG Multicenter Independent Validation Series , 2007, Clinical Cancer Research.

[17]  A. Zell,et al.  Efficient parameter selection for support vector machines in classification and regression via model-based global optimization , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[18]  Samuel Granjeaud,et al.  Prognosis of Breast Cancer and Gene Expression Profiling Using DNA Arrays , 2002, Annals of the New York Academy of Sciences.

[19]  Desmond J. Higham,et al.  GeneRank: Using search engine technology for the analysis of microarray experiments , 2005, BMC Bioinformatics.

[20]  Mithat Gonen Statistical aspects of gene signatures and molecular targets. , 2009 .

[21]  Harald Binder,et al.  Incorporating pathway information into boosting estimation of high-dimensional risk prediction models , 2009, BMC Bioinformatics.

[22]  Salim A. Chowdhury,et al.  Identification of Coordinately Dysregulated Subnetworks in Complex Phenotypes , 2010, Pacific Symposium on Biocomputing.

[23]  Tobias Müller,et al.  Bioinformatics Applications Note Systems Biology Bionet: an R-package for the Functional Analysis of Biological Networks , 2022 .

[24]  Klaus Obermayer,et al.  A new summarization method for affymetrix probe level data , 2006, Bioinform..

[25]  Li Wang,et al.  Hybrid huberized support vector machines for microarray classification , 2007, ICML '07.

[26]  Jelle J. Goeman,et al.  A global test for groups of genes: testing association with a clinical outcome , 2004, Bioinform..

[27]  Yoshihiro Yamanishi,et al.  KEGG for linking genomes to life and the environment , 2007, Nucleic Acids Res..

[28]  Dennis B. Troup,et al.  NCBI GEO: archive for functional genomics data sets—10 years on , 2010, Nucleic Acids Res..

[29]  Lodewyk F. A. Wessels,et al.  A Critical Evaluation of Network and Pathway-Based Classifiers for Outcome Prediction in Breast Cancer , 2011, PloS one.

[30]  Michel Lang,et al.  Survival models with preclustered gene groups as covariates , 2011, BMC Bioinformatics.

[31]  Holger Fröhlich,et al.  Integration of pathway knowledge into a reweighted recursive feature elimination approach for risk stratification of cancer patients , 2010, Bioinform..

[32]  Wei Pan,et al.  Network-based support vector machine for classification of microarray samples , 2009, BMC Bioinformatics.

[33]  O. Aalen,et al.  Further results on the non-parametric linear regression model in survival analysis. , 1993, Statistics in medicine.

[34]  Guanming Wu,et al.  A network module-based method for identifying cancer prognostic signatures , 2012, Genome Biology.

[35]  N. Breslow,et al.  Analysis of Survival Data under the Proportional Hazards Model , 1975 .

[36]  Trey Ideker,et al.  Protein Networks as Logic Functions in Development and Cancer , 2011, PLoS Comput. Biol..

[37]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[38]  Gary D. Bader,et al.  Pathway Commons, a web resource for biological pathway data , 2010, Nucleic Acids Res..

[39]  E Graf,et al.  Assessment and comparison of prognostic classification schemes for survival data. , 1999, Statistics in medicine.

[40]  Lee-Jen Wei,et al.  The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. , 1992, Statistics in medicine.

[41]  H. Fröhlich,et al.  Network Based Consensus Gene Signatures for Biomarker Discovery in Breast Cancer , 2011, PloS one.

[42]  H. Kölbl,et al.  The humoral immune system has a key prognostic impact in node-negative breast cancer. , 2008, Cancer research.

[43]  Doheon Lee,et al.  Inferring Pathway Activity toward Precise Disease Classification , 2008, PLoS Comput. Biol..

[44]  Yi Pan,et al.  Integration of breast cancer gene signatures based on graph centrality , 2011, BMC Systems Biology.

[45]  W. Kibbe,et al.  Annotating the human genome with Disease Ontology , 2009, BMC Genomics.

[46]  Michalis E. Blazadonakis,et al.  Integration of gene signatures using biological knowledge , 2011, Artif. Intell. Medicine.

[47]  Axel Benner,et al.  Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data , 2011, BMC Bioinformatics.

[48]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[49]  J. Goeman L1 Penalized Estimation in the Cox Proportional Hazards Model , 2009, Biometrical journal. Biometrische Zeitschrift.

[50]  Mingguang Shi,et al.  A Network-Based Gene Expression Signature Informs Prognosis and Treatment for Colorectal Cancer Patients , 2012, PloS one.

[51]  Axel Benner,et al.  High‐Dimensional Cox Models: The Choice of Penalty as Part of the Model Building Process , 2010, Biometrical journal. Biometrische Zeitschrift.

[52]  Petter Holme,et al.  Network Properties of Complex Human Disease Genes Identified through Genome-Wide Association Studies , 2009, PloS one.

[53]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[54]  P. J. Verweij,et al.  Cross-validation in survival analysis. , 1993, Statistics in medicine.

[55]  Emmanuel Barillot,et al.  Classification of microarray data using gene networks , 2007, BMC Bioinformatics.

[56]  Z. Shao,et al.  Integrated gene expression profile predicts prognosis of breast cancer patients , 2008, Breast Cancer Research and Treatment.

[57]  Joshy George,et al.  Genetic reclassification of histologic grade delineates new clinical subtypes of breast cancer. , 2006, Cancer research.

[58]  J. Foekens,et al.  Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer , 2005, The Lancet.

[59]  T. Ideker,et al.  Network-based classification of breast cancer metastasis , 2007, Molecular systems biology.

[60]  M. West,et al.  Gene expression predictors of breast cancer outcomes , 2003, The Lancet.

[61]  Qing Wang,et al.  Towards precise classification of cancers based on robust gene functional expression profiles , 2005, BMC Bioinformatics.

[62]  Martin Ester,et al.  Optimally discriminative subnetwork markers predict response to chemotherapy , 2011, Bioinform..

[63]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[64]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[65]  Michael Schroeder,et al.  Google Goes Cancer: Improving Outcome Prediction for Cancer Patients by Network-Based Ranking of Marker Genes , 2012, PLoS Comput. Biol..

[66]  Holger Fröhlich,et al.  Review Biomarker Gene Signature Discovery Integrating Network Knowledge , 2012 .

[67]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[68]  Edward R. Dougherty,et al.  Identification of diagnostic subnetwork markers for cancer in human protein-protein interaction network , 2010, BMC Bioinformatics.

[69]  Ralf Bender,et al.  Generating survival times to simulate Cox proportional hazards models by Ralf Bender, Thomas Augustin and Maria Blettner, Statistics in Medicine 2005; 24:1713–1723 , 2006, Statistics in medicine.

[70]  Tobias Müller,et al.  Identifying functional modules in protein–protein interaction networks: an integrated exact approach , 2008, ISMB.