A network-based deep learning methodology for stratification of tumor mutations

MOTIVATION Tumor stratification has a wide range of biomedical and clinical applications, including diagnosis, prognosis and personalized treatment. However, cancer is always driven by the combination of mutated genes, which are highly heterogeneous across patients. Accurately subdividing the tumors into subtypes is challenging. RESULTS We developed a network-embedding based stratification (NES) methodology to identify clinically relevant patient subtypes from large-scale patients' somatic mutation profiles. The central hypothesis of NES is that two tumors would be classified into the same subtypes if their somatic mutated genes located in the similar network regions of the human interactome. We encoded the genes on the human protein-protein interactome with a network embedding approach and constructed the patients' vectors by integrating the somatic mutation profiles of 7,344 tumor exomes across 15 cancer types. We firstly adopted the lightGBM classification algorithm to train the patients' vectors. The AUC value is around 0.89 in the prediction of the patient's cancer type and around 0.78 in the prediction of the tumor stage within a specific cancer type. The high classification accuracy suggests that network embedding-based patients' features are reliable for dividing the patients. We conclude that we can cluster patients with a specific cancer type into several subtypes by using an unsupervised clustering algorithm to learn the patients' vectors. Among the 15 cancer types, the new patient clusters (subtypes) identified by the NES are significantly correlated with patient survival across 12 cancer types. In summary, this study offers a powerful network-based deep learning methodology for personalized cancer medicine. AVAILABILITY AND IMPLEMENTATION Source code and data can be downloaded from https://github.com/ChengF-Lab/NES. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Dipanwita Roy Chowdhury,et al.  Human protein reference database as a discovery resource for proteomics , 2004, Nucleic Acids Res..

[2]  Mark J. Ratain,et al.  Tumour heterogeneity in the clinic , 2013, Nature.

[3]  Karin Breuer,et al.  InnateDB: systems biology of innate immunity and beyond—recent updates and continuing curation , 2012, Nucleic Acids Res..

[4]  Wei Zhang,et al.  Classifying tumors by supervised network propagation , 2018, Bioinform..

[5]  Bridget E. Begg,et al.  A Proteome-Scale Map of the Human Interactome Network , 2014, Cell.

[6]  Gary D Bader,et al.  International network of cancer genome projects , 2010, Nature.

[7]  Diogo M. Camacho,et al.  Next-Generation Machine Learning for Biological Networks , 2018, Cell.

[8]  Bin Zhang,et al.  PhosphoSitePlus, 2014: mutations, PTMs and recalibrations , 2014, Nucleic Acids Res..

[9]  Peng Qiu,et al.  TCGA-Assembler: open-source software for retrieving and processing TCGA data , 2014, Nature Methods.

[10]  Eric J Topol,et al.  High-performance medicine: the convergence of human and artificial intelligence , 2019, Nature Medicine.

[11]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[12]  Illés J. Farkas,et al.  SignaLink 2 – a signaling pathway resource with multi-layered regulatory networks , 2013, BMC Systems Biology.

[13]  M. Vidal,et al.  A genome-wide positioning systems network algorithm for in silico drug repurposing , 2019, Nature Communications.

[14]  Gisbert Schneider,et al.  Drug discovery with explainable artificial intelligence , 2020, Nature Machine Intelligence.

[15]  Haiyuan Yu,et al.  INstruct: a database of high-quality 3D structurally resolved protein interactome networks , 2013, Bioinform..

[16]  Roded Sharan,et al.  To Embed or Not: Network Embedding as a Paradigm in Computational Biology , 2019, Front. Genet..

[17]  Jianmin Wu,et al.  PINA v2.0: mining interactome modules , 2011, Nucleic Acids Res..

[18]  Joshua D. Campbell,et al.  NetSig: network-based discovery from cancer genomes , 2017, Nature Methods.

[19]  Maxat Kulmanov,et al.  DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier , 2017, Bioinform..

[20]  Xiangrong Liu,et al.  deepDR: a network-based deep learning approach to in silico drug repositioning , 2019, Bioinform..

[21]  Zhongming Zhao,et al.  Quantitative network mapping of the human kinome interactome reveals new clues for rational kinase inhibitor discovery and individualized cancer therapy , 2014, Oncotarget.

[22]  Rafael C. Jimenez,et al.  The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases , 2013, Nucleic Acids Res..

[23]  K. Kinzler,et al.  Evaluating the evaluation of cancer driver genes , 2016, Proceedings of the National Academy of Sciences.

[24]  Livia Perfetto,et al.  MINT, the molecular interaction database: 2012 update , 2011, Nucleic Acids Res..

[25]  Fabian J Theis,et al.  Deep learning: new computational modelling techniques for genomics , 2019, Nature Reviews Genetics.

[26]  S. Yip,et al.  Machine learning classifies cancer. , 2018 .

[27]  Francisco Azuaje,et al.  Artificial intelligence for precision oncology: beyond patient stratification , 2019, npj Precision Oncology.

[28]  Roded Sharan,et al.  Using deep learning to model the hierarchical structure and function of a cell , 2018, Nature Methods.

[29]  Patrick Aloy,et al.  A reference map of the human binary protein interactome , 2020, Nature.

[30]  Edward L. Huttlin,et al.  The BioPlex Network: A Systematic Exploration of the Human Interactome , 2015, Cell.

[31]  Benjamin J. Raphael,et al.  Pan-Cancer Network Analysis Identifies Combinations of Rare Somatic Mutations across Pathways and Protein Complexes , 2014, Nature Genetics.

[32]  Yun Fu,et al.  Entropy‐based consensus clustering for patient stratification , 2017, Bioinform..

[33]  A. Howell,et al.  Origins of breast cancer subtypes and therapeutic implications , 2007, Nature Clinical Practice Oncology.

[34]  Hyeon-Eui Kim,et al.  Deep mining heterogeneous networks of biomedical linked data to predict novel drug‐target associations , 2017, Bioinform..

[35]  R. Nussinov,et al.  Computational network biology: Data, models, and applications , 2020 .

[36]  Hsien-Da Huang,et al.  dbPTM 3.0: an informative resource for investigating substrate site specificity and functional association of protein post-translational modifications , 2012, Nucleic Acids Res..

[37]  Chuang Liu,et al.  A Gene Gravity Model for the Evolution of Cancer Genomes: A Study of 3,000 Cancer Genomes across 9 Cancer Types , 2015, PLoS Comput. Biol..

[38]  Jiajie Peng,et al.  Predicting Parkinson's Disease Genes Based on Node2vec and Autoencoder , 2019, Front. Genet..

[39]  D. Hanahan,et al.  Hallmarks of Cancer: The Next Generation , 2011, Cell.

[40]  Xiangxiang Zeng,et al.  Target identification among known drugs by deep learning from heterogeneous networks , 2020, Chemical science.

[41]  Jin Zhang,et al.  PhosphoNetworks: a database for human phosphorylation networks , 2014, Bioinform..

[42]  Andrew M. Gross,et al.  Network-based stratification of tumor mutations , 2013, Nature Methods.

[43]  Zhongming Zhao,et al.  Studying tumorigenesis through network evolution and somatic mutational perturbations in the cancer interactome. , 2014, Molecular biology and evolution.

[44]  Eamonn J. Keogh,et al.  Addressing Big Data Time Series: Mining Trillions of Time Series Subsequences Under Dynamic Time Warping , 2013, TKDD.

[45]  S. Brunak,et al.  Network biology concepts in complex disease comorbidities , 2016, Nature Reviews Genetics.

[46]  Ruth Nussinov,et al.  Individualized genetic network analysis reveals new therapeutic vulnerabilities in 6,700 cancer genomes , 2020, PLoS Comput. Biol..

[47]  L. Pusztai,et al.  Gene expression profiling in breast cancer: classification, prognostication, and prediction , 2011, The Lancet.

[48]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[49]  Palash Goyal,et al.  Graph Embedding Techniques, Applications, and Performance: A Survey , 2017, Knowl. Based Syst..

[50]  Steven J. M. Jones,et al.  Comprehensive Characterization of Cancer Driver Genes and Mutations , 2018, Cell.

[51]  H. Joensuu,et al.  Artificial Neural Networks Applied to Survival Prediction in Breast Cancer , 1999, Oncology.

[52]  M. Gnant,et al.  Breast cancer , 2019, Nature Reviews Disease Primers.

[53]  P. A. Futreal,et al.  Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. , 2012, The New England journal of medicine.

[54]  Cathryn M. Gould,et al.  Phospho.ELM: a database of phosphorylation sites—update 2011 , 2010, Nucleic acids research.

[55]  Corbin E. Meacham,et al.  Tumour heterogeneity and cancer cell plasticity , 2013, Nature.

[56]  Ruth Nussinov,et al.  Precision medicine review: rare driver mutations and their biophysical classification , 2019, Biophysical Reviews.