De novo pathway-based biomarker identification

Abstract Gene expression profiles have been extensively discussed as an aid to guide the therapy by predicting disease outcome for the patients suffering from complex diseases, such as cancer. However, prediction models built upon single-gene (SG) features show poor stability and performance on independent datasets. Attempts to mitigate these drawbacks have led to the development of network-based approaches that integrate pathway information to produce meta-gene (MG) features. Also, MG approaches have only dealt with the two-class problem of good versus poor outcome prediction. Stratifying patients based on their molecular subtypes can provide a detailed view of the disease and lead to more personalized therapies. We propose and discuss a novel MG approach based on de novo pathways, which for the first time have been used as features in a multi-class setting to predict cancer subtypes. Comprehensive evaluation in a large cohort of breast cancer samples from The Cancer Genome Atlas (TCGA) revealed that MGs are considerably more stable than SG models, while also providing valuable insight into the cancer hallmarks that drive them. In addition, when tested on an independent benchmark non-TCGA dataset, MG features consistently outperformed SG models. We provide an easy-to-use web service at http://pathclass.compbio.sdu.dk where users can upload their own gene expression datasets from breast cancer studies and obtain the subtype predictions from all the classifiers.

[1]  Rainer Breitling,et al.  Graph-based iterative Group Analysis enhances microarray interpretation , 2004, BMC Bioinformatics.

[2]  A. Nobel,et al.  Supervised risk predictor of breast cancer based on intrinsic subtypes. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[3]  Xibei Yang,et al.  Recognition of Multiple Imbalanced Cancer Types Based on DNA Microarray Data Using Ensemble Classifiers , 2013, BioMed research international.

[4]  David Warde-Farley,et al.  Dynamic modularity in protein interaction networks predicts breast cancer outcome , 2009, Nature Biotechnology.

[5]  M. Cronin,et al.  A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. , 2004, The New England journal of medicine.

[6]  S. Mandel,et al.  A molecular signature in blood identifies early Parkinson’s disease , 2012, Molecular Neurodegeneration.

[7]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[8]  Magda Tsolaki,et al.  A Pathway Based Classification Method for Analyzing Gene Expression for Alzheimer’s Disease Diagnosis , 2015, Journal of Alzheimer's disease : JAD.

[9]  Minoru Kanehisa,et al.  KEGG as a reference resource for gene and protein annotation , 2015, Nucleic Acids Res..

[10]  Jan Baumbach,et al.  Comparing the performance of biomedical clustering methods , 2015, Nature Methods.

[11]  Ramón Díaz-Uriarte,et al.  GeneSrF and varSelRF: a web-based tool and R package for gene selection and classification using random forest , 2007, BMC Bioinformatics.

[12]  O. Yersal,et al.  Biological subtypes of breast cancer: Prognostic and therapeutic implications. , 2014, World journal of clinical oncology.

[13]  E. Marcotte,et al.  Prioritizing candidate disease genes by network-based boosting of genome-wide association data. , 2011, Genome research.

[14]  Tobias Müller,et al.  Bioinformatics Applications Note Systems Biology Bionet: an R-package for the Functional Analysis of Biological Networks , 2022 .

[15]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumours , 2013 .

[16]  Charles A Tilford,et al.  Gene set enrichment analysis. , 2009, Methods in molecular biology.

[17]  Anirban P. Mitra,et al.  Validation of a genomic classifier that predicts metastasis following radical prostatectomy in an at risk patient population. , 2013, The Journal of urology.

[18]  S. Duffy,et al.  Critical research gaps and translational priorities for the successful prevention and treatment of breast cancer , 2013, Breast Cancer Research.

[19]  T. Ideker,et al.  Network-based classification of breast cancer metastasis , 2007, Molecular systems biology.

[20]  Qihua Tan,et al.  Classification of Breast Cancer Subtypes by combining Gene Expression and DNA Methylation Data , 2014, J. Integr. Bioinform..

[21]  H. Ditzel,et al.  Robust de novo pathway enrichment with KeyPathwayMiner 5 , 2016, F1000Research.

[22]  Holger Fröhlich,et al.  Prognostic gene signatures for patient stratification in breast cancer - accuracy, stability and interpretability of gene selection approaches using prior knowledge on protein-protein interactions , 2012, BMC Bioinformatics.

[23]  Ron Shamir,et al.  Detecting pathways transcriptionally correlated with clinical parameters. , 2008, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[24]  Dorothea Emig,et al.  Partitioning biological data with transitivity clustering , 2010, Nature Methods.

[25]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[26]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Jeffrey T Leek,et al.  A general framework for multiple testing dependence , 2008, Proceedings of the National Academy of Sciences.

[28]  Lodewyk F. A. Wessels,et al.  A Critical Evaluation of Network and Pathway-Based Classifiers for Outcome Prediction in Breast Cancer , 2011, PloS one.

[29]  Justin Guinney,et al.  GSVA: gene set variation analysis for microarray and RNA-Seq data , 2013, BMC Bioinformatics.

[30]  Ralf Herwig,et al.  The ConsensusPathDB interaction database: 2013 update , 2012, Nucleic Acids Res..

[31]  Atul J. Butte,et al.  Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges , 2012, PLoS Comput. Biol..

[32]  Doheon Lee,et al.  Inferring Pathway Activity toward Precise Disease Classification , 2008, PLoS Comput. Biol..

[33]  C. Fan,et al.  PAM50 assay and the three-gene model for identifying the major and clinically relevant molecular subtypes of breast cancer , 2012, Breast Cancer Research and Treatment.

[34]  Tobias Friedrich,et al.  Efficient key pathway mining: combining networks and OMICS data. , 2012, Integrative biology : quantitative biosciences from nano to macro.

[35]  Lodewyk F. A. Wessels,et al.  Current composite-feature classification methods do not outperform simple single-genes classifiers in breast cancer prognosis , 2013, Front. Genet..

[36]  Ron Shamir,et al.  Identification of functional modules using network topology and high-throughput data , 2007, BMC Systems Biology.

[37]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[38]  James D Sullivan,et al.  Oncotype DX(®) colon cancer assay for prediction of recurrence risk in patients with stage II and III colon cancer: A review of the evidence. , 2015, Surgical oncology.

[39]  D. Watson,et al.  Analytical validation of the Oncotype DX prostate cancer assay – a clinical RT-PCR assay optimized for prostate needle biopsies , 2013, BMC Genomics.

[40]  Serban Nacu,et al.  Gene expression network analysis and applications to immunology , 2007, Bioinform..

[41]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[42]  Charles DeLisi,et al.  Pathway-based classification of cancer subtypes , 2012, Biology Direct.

[43]  C. Perou,et al.  PAM 50 assay and the three-gene model for identifying the major and clinically relevant molecular subtypes of breast cancer , 2012 .

[44]  B. LaFleur,et al.  Combined Benefit of Prediction and Treatment: A Criterion for Evaluating Clinical Prediction Models , 2014, Cancer informatics.

[45]  Jun Lu,et al.  Pathway level analysis of gene expression using singular value decomposition , 2005, BMC Bioinformatics.

[46]  I. Jurisica,et al.  Unequal evolutionary conservation of human protein interactions in interologous networks , 2007, Genome Biology.

[47]  Constantin F. Aliferis,et al.  A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification , 2008, BMC Bioinformatics.

[48]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[49]  Fabien Reyal,et al.  Pooling breast cancer datasets has a synergetic effect on classification performance and improves signature stability , 2008, BMC Genomics.

[50]  Kristel Van Steen,et al.  Predictive value of epithelial gene expression profiles for response to infliximab in Crohn's disease‡ , 2010, Inflammatory bowel diseases.

[51]  Thomas Lengauer,et al.  Classification with correlated features: unreliability of feature ranking and solutions , 2011, Bioinform..

[52]  Mehmet Koyutürk,et al.  Comprehensive Evaluation of Composite Gene Features in Cancer Outcome Prediction , 2014, Cancer informatics.

[53]  Amin Allahyar,et al.  FERAL: network-based classifier with application to breast cancer outcome prediction , 2015, Bioinform..

[54]  Benno Schwikowski,et al.  Discovering regulatory and signalling circuits in molecular interaction networks , 2002, ISMB.

[55]  Ben S. Wittner,et al.  Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1 , 2009, Nature.

[56]  Xia Li,et al.  Characterizing Genes with Distinct Methylation Patterns in the Context of Protein-Protein Interaction Network: Application to Human Brain Tissues , 2013, PloS one.

[57]  J. Nevins Pathway-based classification of lung cancer: a strategy to guide therapeutic selection. , 2011, Proceedings of the American Thoracic Society.

[58]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[59]  Jan Baumbach,et al.  On the performance of de novo pathway enrichment , 2017, npj Systems Biology and Applications.

[60]  L. Pasquier,et al.  Orphanet Journal of Rare Diseases , 2006 .

[61]  M. Acencio,et al.  HTRIdb: an open-access database for experimentally verified human transcriptional regulation interactions , 2012, BMC Genomics.

[62]  Shi-Hua Zhang,et al.  Detecting disease associated modules and prioritizing active genes based on high throughput data , 2010, BMC Bioinformatics.

[63]  Miron B. Kursa,et al.  Robustness of Random Forest-based gene selection methods , 2013, BMC Bioinformatics.

[64]  Tobias Müller,et al.  Robustness and accuracy of functional modules in integrated network analysis , 2012, Bioinform..

[65]  G. Tsokos,et al.  A T cell gene expression panel for the diagnosis and monitoring of disease activity in patients with systemic lupus erythematosus. , 2014, Clinical immunology.

[66]  D. Hanahan,et al.  Hallmarks of Cancer: The Next Generation , 2011, Cell.