OncoVar: an integrated database and analysis platform for oncogenic driver variants in cancers

Abstract The prevalence of neutral mutations in cancer cell population impedes the distinguishing of cancer-causing driver mutations from passenger mutations. To systematically prioritize the oncogenic ability of somatic mutations and cancer genes, we constructed a useful platform, OncoVar (https://oncovar.org/), which employed published bioinformatics algorithms and incorporated known driver events to identify driver mutations and driver genes. We identified 20 162 cancer driver mutations, 814 driver genes and 2360 pathogenic pathways with high-confidence by reanalyzing 10 769 exomes from 33 cancer types in The Cancer Genome Atlas (TCGA) and 1942 genomes from 18 cancer types in International Cancer Genome Consortium (ICGC). OncoVar provides four points of view, ‘Mutation’, ‘Gene’, ‘Pathway’ and ‘Cancer’, to help researchers to visualize the relationships between cancers and driver variants. Importantly, identification of actionable driver alterations provides promising druggable targets and repurposing opportunities of combinational therapies. OncoVar provides a user-friendly interface for browsing, searching and downloading somatic driver mutations, driver genes and pathogenic pathways in various cancer types. This platform will facilitate the identification of cancer drivers across individual cancer cohorts and helps to rank mutations or genes for better decision-making among clinical oncologists, cancer researchers and the broad scientific community interested in cancer precision medicine.

[1]  Jialong Liang,et al.  Genetic landscape of papillary thyroid carcinoma in the Chinese population , 2018, The Journal of pathology.

[2]  S. Gabriel,et al.  Discovery and saturation analysis of cancer genes across 21 tumor types , 2014, Nature.

[3]  E. Li,et al.  SMYD3 stimulates EZR and LOXL2 transcription to enhance proliferation, migration, and invasion in esophageal squamous cell carcinoma. , 2016, Human pathology.

[4]  Yongwook Choi,et al.  PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels , 2015, Bioinform..

[5]  Wei-Chung Cheng,et al.  DriverDBv3: a multi-omics database for cancer driver gene research , 2019, Nucleic Acids Res..

[6]  Serafim Batzoglou,et al.  Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++ , 2010, PLoS Comput. Biol..

[7]  Trevor Hastie,et al.  REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants. , 2016, American journal of human genetics.

[8]  Ken Chen,et al.  Systematic Functional Annotation of Somatic Mutations in Cancer. , 2018, Cancer cell.

[9]  Zhongsheng Sun,et al.  AI-Driver: an ensemble method for identifying driver mutations in personal cancer genomes , 2020, NAR genomics and bioinformatics.

[10]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[11]  David Haussler,et al.  New Methods for Detecting Lineage-Specific Selection , 2006, RECOMB.

[12]  John P. Overington,et al.  The druggable genome and support for target identification and validation in drug development , 2016, Science Translational Medicine.

[13]  A. Gonzalez-Perez,et al.  A compendium of mutational cancer driver genes , 2020, Nature Reviews Cancer.

[14]  Tom R. Gaunt,et al.  Predicting the Functional, Molecular, and Phenotypic Consequences of Amino Acid Substitutions using Hidden Markov Models , 2012, Human mutation.

[15]  Icgc,et al.  Pan-cancer analysis of whole genomes , 2017, bioRxiv.

[16]  A. Gonzalez-Perez,et al.  Functional impact bias reveals cancer drivers , 2012, Nucleic acids research.

[17]  Benjamin J. Raphael,et al.  De novo discovery of mutated driver pathways in cancer , 2011 .

[18]  Shi-Hua Zhang,et al.  Efficient methods for identifying mutated driver pathways in cancer , 2012, Bioinform..

[19]  A. Bashashati,et al.  DriverNet: uncovering the impact of somatic driver mutations on transcriptional networks in cancer , 2012, Genome Biology.

[20]  Yi Zhang,et al.  Performance evaluation of pathogenicity-computation methods for missense variants , 2018, Nucleic acids research.

[21]  Justin Newberg,et al.  SBCDDB: Sleeping Beauty Cancer Driver Database for gene discovery in mouse models of human cancers , 2017, Nucleic Acids Res..

[22]  J. Buxbaum,et al.  A SPECTRAL APPROACH INTEGRATING FUNCTIONAL GENOMIC ANNOTATIONS FOR CODING AND NONCODING VARIANTS , 2015, Nature Genetics.

[23]  S. Armstrong,et al.  HOXA9 Reprograms the Enhancer Landscape to Promote Leukemogenesis. , 2018, Cancer cell.

[24]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[25]  A. Siepel,et al.  Probabilities of Fitness Consequences for Point Mutations Across the Human Genome , 2014, Nature Genetics.

[26]  Xiaohui Xie,et al.  DANN: a deep learning approach for annotating the pathogenicity of genetic variants , 2015, Bioinform..

[27]  C. Sander,et al.  Predicting the functional impact of protein mutations: application to cancer genomics , 2011, Nucleic acids research.

[28]  Xiaohui Shi,et al.  OncoBase: a platform for decoding regulatory somatic mutations in human cancers , 2018, Nucleic Acids Res..

[29]  Qi Liu,et al.  C3: Consensus Cancer Driver Gene Caller , 2019, Genom. Proteom. Bioinform..

[30]  Steven A. Roberts,et al.  Mutational heterogeneity in cancer and the search for new cancer genes , 2014 .

[31]  J. P. Hou,et al.  DawnRank: discovering personalized driver genes in cancer , 2014, Genome Medicine.

[32]  P. Campbell,et al.  Somatic mutation in cancer and normal cells , 2015, Science.

[33]  A. Gonzalez-Perez,et al.  OncodriveFML: a general framework to identify coding and non-coding regions with cancer driver mutations , 2016, Genome Biology.

[34]  Xianfeng Li,et al.  EpiDenovo: a platform for linking regulatory de novo mutations to developmental epigenetics and diseases , 2017, Nucleic Acids Res..

[35]  Steven J. M. Jones,et al.  Pan-cancer analysis of whole genomes , 2020, Nature.

[36]  Mingming Jia,et al.  COSMIC: somatic cancer genetics at high-resolution , 2016, Nucleic Acids Res..

[37]  David Tamborero,et al.  OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes , 2013, Bioinform..

[38]  Richard Simon,et al.  Identifying cancer driver genes in tumor genome sequencing studies , 2011, Bioinform..

[39]  The Icgctcga Pan-Cancer Analysis of Whole Genomes Consortium Pan-cancer analysis of whole genomes , 2020 .

[40]  Qingxia Chen,et al.  MSEA: detection and quantification of mutation hotspots through mutation set enrichment analysis , 2014, Genome Biology.

[41]  X. Wang,et al.  Ras-induced Epigenetic Inactivation of the RRAD (Ras-related Associated with Diabetes) Gene Promotes Glucose Uptake in a Human Ovarian Cancer Model* , 2014, The Journal of Biological Chemistry.

[42]  Michael P. Schroeder,et al.  Cancer Genome Interpreter annotates the biological and clinical relevance of tumor alterations , 2017, Genome Medicine.

[43]  Steven J. M. Jones,et al.  Comprehensive Characterization of Cancer Driver Genes and Mutations , 2018, Cell.

[44]  F. Supek,et al.  MUFFINN: cancer gene discovery via network analysis of somatic mutation data , 2016, Genome Biology.

[45]  Michael P. Schroeder,et al.  In silico prescription of anticancer drugs to cohorts of 28 tumor types reveals targeting opportunities. , 2015, Cancer cell.

[46]  Xing-Ming Zhao,et al.  OGEE v2: an update of the online gene essentiality database with special focus on differentially essential genes in human cancer cell lines , 2016, Nucleic Acids Res..

[47]  C. Sander,et al.  Mutual exclusivity analysis identifies oncogenic network modules. , 2012, Genome research.

[48]  Gary D Bader,et al.  Systematic analysis of somatic mutations in phosphorylation signaling predicts novel cancer drivers , 2013 .

[49]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[50]  Guangchuang Yu,et al.  clusterProfiler: an R package for comparing biological themes among gene clusters. , 2012, Omics : a journal of integrative biology.

[51]  Wei-Chung Cheng,et al.  DriverDBv2: a database for human cancer driver gene research , 2015, Nucleic Acids Res..

[52]  Yassen Assenov,et al.  Maftools: efficient and comprehensive analysis of somatic variants in cancer , 2018, Genome research.

[53]  Michael P. Schroeder,et al.  IntOGen-mutations identifies cancer drivers across tumor types , 2013, Nature Methods.

[54]  Chunyu Liu,et al.  Genes with de novo mutations are shared by four neuropsychiatric disorders discovered from NPdenovo database , 2016, Molecular Psychiatry.

[55]  Niranjan Nagarajan,et al.  ConsensusDriver Improves upon Individual Algorithms for Predicting Driver Alterations in Different Cancer Types and Individual Patients. , 2018, Cancer research.

[56]  K. Kinzler,et al.  Evaluating the evaluation of cancer driver genes , 2016, Proceedings of the National Academy of Sciences.

[57]  W. Cai,et al.  Identification of a novel missense (C7W) mutation of SOD1 in a large familial amyotrophic lateral sclerosis pedigree , 2014, Neurobiology of Aging.

[58]  Guojun Li,et al.  MaxMIF: A New Method for Identifying Cancer Driver Genes through Effective Data Integration , 2018, Advanced science.

[59]  Li Ding,et al.  Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines. , 2018, Cell systems.

[60]  Jun Yu,et al.  Analyses of non-coding somatic drivers in 2,658 cancer whole genomes , 2020, Nature.

[61]  Yi Jiang,et al.  VarCards: an integrated genetic and clinical database for coding variants in the human genome , 2017, Nucleic Acids Res..

[62]  Justin C. Fay,et al.  Identification of deleterious mutations within three human genomes. , 2009, Genome research.

[63]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[64]  R. Gibbs,et al.  Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies. , 2015, Human molecular genetics.

[65]  Zhongsheng Sun,et al.  A novel 10-base pair insertion mutation in exon 5 of the SOD1 gene in a Chinese family with amyotrophic lateral sclerosis , 2016, Neurobiology of Aging.

[66]  X. Hua,et al.  DriverML: a machine learning algorithm for identifying driver genes in cancer sequencing studies , 2019, Nucleic acids research.

[67]  Eric Boerwinkle,et al.  In silico tools for splicing defect prediction - A survey from the viewpoint of end-users , 2013, Genetics in Medicine.

[68]  Kei-Hoi Cheung,et al.  A Statistical Framework to Predict Functional Non-Coding Regions in the Human Genome Through Integrated Analysis of Annotation Data , 2015, Scientific Reports.

[69]  G. Mills,et al.  Comprehensive assessment of computational algorithms in predicting cancer driver mutations , 2020, Genome Biology.

[70]  H. Carter,et al.  Identifying Mendelian disease genes with the Variant Effect Scoring Tool , 2013, BMC Genomics.

[71]  Adam Godzik,et al.  e-Driver: a novel method to identify protein regions driving cancer , 2014, Bioinform..

[72]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[73]  Gill Bejerano,et al.  M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity , 2016, Nature Genetics.

[74]  J. Reis-Filho,et al.  Benchmarking mutation effect prediction algorithms using functionally validated cancer-related missense mutations , 2014, Genome Biology.

[75]  P. A. Futreal,et al.  Emerging patterns of somatic mutations in cancer , 2013, Nature Reviews Genetics.

[76]  Lei Zhang,et al.  Discovering personalized driver mutation profiles of single samples in cancer by network control strategy , 2018, Bioinform..

[77]  Alex H. Wagner,et al.  DGIdb 3.0: a redesign and expansion of the drug–gene interaction database , 2017, bioRxiv.

[78]  Zhongsheng Sun,et al.  Prevalence and architecture of posttranscriptionally impaired synonymous mutations in 8,320 genomes across 22 cancer types , 2020, Nucleic acids research.

[79]  Moriah H Nissan,et al.  OncoKB: A Precision Oncology Knowledge Base. , 2017, JCO precision oncology.

[80]  Ryan L. Collins,et al.  The mutational constraint spectrum quantified from variation in 141,456 humans , 2020, Nature.

[81]  Nuno A. Fonseca,et al.  Analyses of non-coding somatic drivers in 2,658 cancer whole genomes , 2020, Nature.

[82]  E. Lander,et al.  Identification of cancer driver genes based on nucleotide context , 2019, Nature Genetics.

[83]  J. Shendure,et al.  Exome sequencing as a tool for Mendelian disease gene discovery , 2011, Nature Reviews Genetics.

[84]  Colin Campbell,et al.  An integrative approach to predicting the functional effects of non-coding and coding sequence variation , 2015, Bioinform..

[85]  Peter W. Laird,et al.  Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer , 2018, Cell.

[86]  Steven J. M. Jones,et al.  Oncogenic Signaling Pathways in The Cancer Genome Atlas. , 2018, Cell.

[87]  Stephan J Sanders,et al.  Refining the role of de novo protein truncating variants in neurodevelopmental disorders using population reference samples , 2016, Nature Genetics.

[88]  Xiaohui Xie,et al.  Identifying novel constrained elements by exploiting biased substitution patterns , 2009, Bioinform..

[89]  X. Hua,et al.  DrGaP: a powerful tool for identifying driver genes and pathways in cancer sequencing studies. , 2013, American journal of human genetics.

[90]  Shicai Wang,et al.  COSMIC: the Catalogue Of Somatic Mutations In Cancer , 2018, Nucleic Acids Res..