A data-driven interactome of synergistic genes improves network-based cancer outcome prediction

Robustly predicting outcome for cancer patients from gene expression is an important challenge on the road to better personalized treatment. Network-based outcome predictors (NOPs), which considers the cellular wiring diagram in the classification, hold much promise to improve performance, stability and interpretability of identified marker genes. Problematically, reports on the efficacy of NOPs are conflicting and for instance suggest that utilizing random networks performs on par to networks that describe biologically relevant interactions. In this paper we turn the prediction problem around: instead of using a given biological network in the NOP, we aim to identify the network of genes that truly improves outcome prediction. To this end, we propose SyNet, a gene network constructed ab initio from synergistic gene pairs derived from survival-labelled gene expression data. To obtain SyNet, we evaluate synergy for all 69 million pairwise combinations of genes resulting in a network that is specific to the dataset and phenotype under study and can be used to in a NOP model. We evaluated SyNet and 11 other networks on a compendium dataset of >4000 survival-labelled breast cancer samples. For this purpose, we used cross-study validation which more closely emulates real world application of these outcome predictors. We find that SyNet is the only network that truly improves performance, stability and interpretability in several existing NOPs. We show that SyNet overlaps significantly with existing gene networks, and can be confidently predicted (~85% AUC) from graph-topological descriptions of these networks, in particular the breast tissue-specific network. Due to its data-driven nature, SyNet is not biased to well-studied genes and thus facilitates post-hoc interpretation. We find that SyNet is highly enriched for known breast cancer genes and genes related to e.g. histological grade and tamoxifen resistance, suggestive of a role in determining breast cancer outcome.

[1]  L. Ein-Dor,et al.  Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[2]  David Warde-Farley,et al.  Dynamic modularity in protein interaction networks predicts breast cancer outcome , 2009, Nature Biotechnology.

[3]  F. Markowetz,et al.  The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups , 2012, Nature.

[4]  Kenneth Cowan,et al.  A recombinant adenovirus expressing p27Kip1 induces cell cycle arrest and loss of cyclin-Cdk activity in human breast cancer cells , 1997, Oncogene.

[5]  Ying Huang,et al.  RBEL1 Is a Novel Gene That Encodes a Nucleocytoplasmic Ras Superfamily GTP-binding Protein and Is Overexpressed in Breast Cancer* , 2007, Journal of Biological Chemistry.

[6]  T. Crook,et al.  The p53 pathway in breast cancer , 2002, Breast Cancer Research.

[7]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[8]  Van,et al.  A gene-expression signature as a predictor of survival in breast cancer. , 2002, The New England journal of medicine.

[9]  Atul J. Butte,et al.  Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges , 2012, PLoS Comput. Biol..

[10]  Igor Jurisica,et al.  Integrated interactions database: tissue-specific view of the human and model organism interactomes , 2015, Nucleic Acids Res..

[11]  Dimitris Anastassiou,et al.  Biomolecular Events in Cancer Revealed by Attractor Metagenes , 2012, PLoS Comput. Biol..

[12]  Anne-Laure Boulesteix,et al.  Cross-study validation for the assessment of prediction algorithms , 2014, Bioinform..

[13]  M. Glotzer,et al.  Cooperative assembly of CYK-4/MgcRacGAP and ZEN-4/MKLP1 to form the centralspindlin complex. , 2007, Molecular biology of the cell.

[14]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[15]  Herman Yeger,et al.  Decreased levels of the cell-cycle inhibitor p27Kip1 protein: Prognostic implications in primary breast cancer , 1997, Nature Medicine.

[16]  Bridget E. Begg,et al.  A Proteome-Scale Map of the Human Interactome Network , 2014, Cell.

[17]  Trevor Hastie,et al.  Averaged gene expressions for regression. , 2007, Biostatistics.

[18]  A. Bielinsky,et al.  Mcm10 regulates the stability and chromatin association of DNA polymerase-alpha. , 2004, Molecular cell.

[19]  Daniel S. Himmelstein,et al.  Understanding multicellular function and disease with human tissue-specific networks , 2015, Nature Genetics.

[20]  Gerhard Christofori,et al.  Mouse models of breast cancer metastasis , 2006, Breast Cancer Research.

[21]  L. V. van't Veer,et al.  70-Gene Signature as an Aid to Treatment Decisions in Early-Stage Breast Cancer. , 2016, The New England journal of medicine.

[22]  T. Ideker,et al.  Network-based classification of breast cancer metastasis , 2007, Molecular systems biology.

[23]  Jesús Espinal-Enríquez,et al.  Transcriptional Network Architecture of Breast Cancer Molecular Subtypes , 2016, Front. Physiol..

[24]  Edward L. Huttlin,et al.  The BioPlex Network: A Systematic Exploration of the Human Interactome , 2015, Cell.

[25]  Dimitrios I. Fotiadis,et al.  Machine learning applications in cancer prognosis and prediction , 2014, Computational and structural biotechnology journal.

[26]  R. Sharan,et al.  Human protein interaction networks across tissues and diseases , 2015, Front. Genet..

[27]  Devin K. Schweppe,et al.  Architecture of the human interactome defines protein communities and disease networks , 2017, Nature.

[28]  Casey S. Greene,et al.  Chapter 2: Data-Driven View of Disease Biology , 2012, PLoS Comput. Biol..

[29]  Holger Fröhlich,et al.  Network and Data Integration for Biomarker Signature Discovery via Network Smoothed T-Statistics , 2013, PloS one.

[30]  Wei Zhang,et al.  Network-based machine learning and graph theory algorithms for precision oncology , 2017, npj Precision Oncology.

[31]  M. Winey,et al.  Human Mps1 protein kinase is required for centrosome duplication and normal mitotic progression , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Andreas Krämer,et al.  Causal analysis approaches in Ingenuity Pathway Analysis , 2013, Bioinform..

[33]  Guanming Wu,et al.  A network module-based method for identifying cancer prognostic signatures , 2012, Genome Biology.

[34]  Antje Chang,et al.  The BRENDA Tissue Ontology (BTO): the first all-integrating ontology of all organisms for enzyme sources , 2010, Nucleic Acids Res..

[35]  John P. A. Ioannidis,et al.  An empirical assessment of validation practices for molecular classifiers , 2011, Briefings Bioinform..

[36]  Luonan Chen,et al.  Discovering functions and revealing mechanisms at molecular level from biological networks , 2007, Proteomics.

[37]  Chao Cheng,et al.  E2F4 regulatory program predicts patient survival prognosis in breast cancer , 2014, Breast Cancer Research.

[38]  D. Ransohoff Bias as a threat to the validity of cancer molecular-marker research , 2005, Nature reviews. Cancer.

[39]  Cor J. Veenman,et al.  A protocol for building and evaluating predictors of disease state based on microarray data , 2005, Bioinform..

[40]  David Beach,et al.  cdc2 protein kinase is complexed with both cyclin A and B: Evidence for proteolytic inactivation of MPF , 1989, Cell.

[41]  Leslie Wilson,et al.  Mammalian mad2 and bub1/bubR1 recognize distinct spindle-attachment and kinetochore-tension checkpoints , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Leming Shi,et al.  Effect of training-sample size and classification difficulty on the accuracy of genomic predictors , 2010, Breast Cancer Research.

[43]  Amin Allahyar,et al.  A data-driven interactome of synergistic genes improves network based cancer outcome prediction , 2018 .

[44]  D. Hanahan,et al.  The Hallmarks of Cancer , 2000, Cell.

[45]  Edward R. Dougherty,et al.  Performance of feature-selection methods in the classification of high-dimension data , 2009, Pattern Recognit..

[46]  Wieland B Huttner,et al.  Aspm specifically maintains symmetric proliferative divisions of neuroepithelial cells , 2006, Proceedings of the National Academy of Sciences.

[47]  S. L. Wong,et al.  Towards a proteome-scale map of the human protein–protein interaction network , 2005, Nature.

[48]  Michael Schroeder,et al.  Google Goes Cancer: Improving Outcome Prediction for Cancer Patients by Network-Based Ranking of Marker Genes , 2012, PLoS Comput. Biol..

[49]  Shu Ichihara,et al.  Breast cancer prognostic classification in the molecular era: the role of histological grade , 2010, Breast Cancer Research.

[50]  Chi-Ying F. Huang,et al.  Identification of a novel cell cycle regulated gene, HURP, overexpressed in human hepatocellular carcinoma , 2003, Oncogene.

[51]  Berthold Lausen,et al.  Stat1 and CD74 overexpression is co-dependent and linked to increased invasion and lymph node metastasis in triple-negative breast cancer. , 2012, Journal of proteomics.

[52]  Chunyu Liu,et al.  Removing Batch Effects in Analysis of Expression Microarray Data: An Evaluation of Six Batch Adjustment Methods , 2011, PloS one.

[53]  Melissa A. Troester,et al.  Intratumoral heterogeneity as a source of discordance in breast cancer biomarker classification , 2016, Breast Cancer Research.

[54]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumours , 2013 .

[55]  Xiao Sun,et al.  Meta-analysis of cancer gene-profiling data. , 2010, Methods in molecular biology.

[56]  David Venet,et al.  Most Random Gene Expression Signatures Are Significantly Associated with Breast Cancer Outcome , 2011, PLoS Comput. Biol..

[57]  M. Newman Analysis of weighted networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[58]  Tian Zheng,et al.  Identification of gene interactions associated with disease from gene expression data using synergy networks , 2008, BMC Systems Biology.

[59]  Gordon K. Smyth,et al.  Technical Variability Is Greater than Biological Variability in a Microarray Experiment but Both Are Outweighed by Changes Induced by Stimulation , 2011, PloS one.

[60]  Sambasivarao Damaraju,et al.  Effects of Sample Size on Differential Gene Expression, Rank Order and Prediction Accuracy of a Gene Signature , 2013, PloS one.

[61]  Edwin Wang,et al.  Signaling network assessment of mutations and copy number variations predict breast cancer subtype-specific drug targets. , 2013, Cell reports.

[62]  E. Wang,et al.  Predictive genomics: a cancer hallmark network framework for predicting tumor clinical phenotypes using genome sequencing data. , 2014, Seminars in cancer biology.

[63]  Lodewyk F. A. Wessels,et al.  Current composite-feature classification methods do not outperform simple single-genes classifiers in breast cancer prognosis , 2013, Front. Genet..

[64]  J. Peterse,et al.  Breast cancer metastasis: markers and models , 2005, Nature Reviews Cancer.

[65]  C. Shapiro,et al.  Side effects of adjuvant treatment of breast cancer. , 2001, The New England journal of medicine.

[66]  M. Vidal,et al.  Literature-curated protein interaction datasets , 2009, Nature Methods.

[67]  D. Pe’er,et al.  An Integrated Approach to Uncover Drivers of Cancer , 2010, Cell.

[68]  Harald Binder,et al.  Removing Batch Effects from Longitudinal Gene Expression - Quantile Normalization Plus ComBat as Best Approach for Microarray Transcriptome Data , 2016, PloS one.

[69]  Geoffrey E. Hinton,et al.  Visualizing non-metric similarities in multiple maps , 2011, Machine Learning.

[70]  M. Kubát An Introduction to Machine Learning , 2017, Springer International Publishing.

[71]  Chang-Yuan Wei,et al.  Expression of CDKN1A/p21 and TGFBR2 in breast cancer and their prognostic significance. , 2015, International journal of clinical and experimental pathology.

[72]  Ming Tan,et al.  Molecular mechanisms of erbB2-mediated breast cancer chemoresistance. , 2007, Advances in experimental medicine and biology.

[73]  Jan Baumbach,et al.  Syddansk Universitet De novo pathway-based biomarker identification , 2017 .

[74]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[75]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[76]  Jeffrey T Leek,et al.  Statistical Applications in Genetics and Molecular Biology The practical effect of batch on genomic prediction , 2012 .

[77]  S. Rashid,et al.  Hallmarks of Cancer Cell , 2017 .

[78]  Yang I Li,et al.  An Expanded View of Complex Traits: From Polygenic to Omnigenic , 2017, Cell.

[79]  Trey Ideker,et al.  Protein Networks as Logic Functions in Development and Cancer , 2011, PLoS Comput. Biol..

[80]  Feng-Chun Yang,et al.  The tumor suppressor CDKN3 controls mitosis , 2013, The Journal of cell biology.

[81]  Hong Wu,et al.  Maternal embryonic leucine zipper kinase (MELK) regulates multipotent neural progenitor proliferation , 2005, The Journal of cell biology.

[82]  Michael Schroeder,et al.  Network information improves cancer outcome prediction , 2014, Briefings Bioinform..

[83]  Lodewyk F. A. Wessels,et al.  A Critical Evaluation of Network and Pathway-Based Classifiers for Outcome Prediction in Breast Cancer , 2011, PloS one.

[84]  Kian-Yong Lee,et al.  Direct interaction between centralspindlin and PRC1 reinforces mechanical resilience of the central spindle , 2015, Nature Communications.

[85]  John R. Yates,et al.  The human CENP-A centromeric nucleosome-associated complex , 2006, Nature Cell Biology.

[86]  Yen-Han Lin,et al.  False positive reduction in protein-protein interaction predictions using gene ontology annotations , 2007, BMC Bioinformatics.

[87]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[88]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[89]  Geoffrey J McLachlan,et al.  Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[90]  Amin Allahyar,et al.  FERAL: network-based classifier with application to breast cancer outcome prediction , 2015, Bioinform..

[91]  D. Hanahan,et al.  Hallmarks of Cancer: The Next Generation , 2011, Cell.

[92]  Hui Zhang,et al.  Ubiquitin-conjugating enzyme UBE2C: molecular biology, role in tumorigenesis, and potential as a biomarker , 2012, Tumor Biology.

[93]  Jeroen de Ridder,et al.  Scale-space measures for graph topology link protein network architecture to function , 2014, Bioinform..