dbCID: a manually curated resource for exploring the driver indels in human cancer

While recent advances in next-generation sequencing technologies have enabled the creation of a multitude of databases in cancer genomic research, there is no comprehensive database focusing on the annotation of driver indels (insertions and deletions) yet. Therefore, we have developed the database of Cancer driver InDels (dbCID), which is a collection of known coding indels that likely to be engaged in cancer development, progression or therapy. dbCID contains experimentally supported and putative driver indels derived from manual curation of literature and is freely available online at http://bioinfo.ahu.edu.cn: 8080/dbCID. Using the data deposited in dbCID, we summarized features of driver indels in four levels (gene, DNA, transcript and protein) through comparing with putative neutral indels. We found that most of the genes containing driver indels in dbCID are known cancer genes playing a role in tumorigenesis. Contrary to the expectation, the sequences affected by driver frameshift indels are not larger than those by neutral ones. In addition, the frameshift and inframe driver indels prefer to disrupt high-conservative regions both in DNA sequences and protein domains. Finally, we developed a computational method for discriminating cancer driver from neutral frameshift indels based on the deposited data in dbCID. The proposed method outperformed other widely used non-cancer-specific predictors on an external test set, which demonstrated the usefulness of the data deposited in dbCID. We hope dbCID will be a benchmark for improving and evaluating prediction algorithms, and the characteristics summarized here may assist with investigating the mechanism of indel-cancer association.

[1]  Tsviya Olender,et al.  Human olfaction: from genomic variation to phenotypic diversity. , 2009, Trends in genetics : TIG.

[2]  Rebecca L. Siegel Mph,et al.  Cancer statistics, 2018 , 2018 .

[3]  Marcin Imielinski,et al.  The cancer precision medicine knowledge base for structured clinical-grade mutations and interpretations , 2016, J. Am. Medical Informatics Assoc..

[4]  Pablo Cingolani,et al.  © 2012 Landes Bioscience. Do not distribute. , 2022 .

[5]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[6]  T. Negri,et al.  Oncogenic and ligand‐dependent activation of KIT/PDGFRA in surgical samples of imatinib‐treated gastrointestinal stromal tumours (GISTs) , 2009, The Journal of pathology.

[7]  Zechen Chong,et al.  TransVar: a multilevel variant annotator for precision genomics , 2015, Nature Methods.

[8]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[9]  D. MacArthur,et al.  Loss-of-function variants in the genomes of healthy humans. , 2010, Human molecular genetics.

[10]  Sandro J. de Souza,et al.  Populational landscape of INDELs affecting transcription factor-binding sites in humans , 2015, BMC Genomics.

[11]  S. Gabriel,et al.  Analysis of 6,515 exomes reveals a recent origin of most human protein-coding variants , 2012, Nature.

[12]  Michael P. Schroeder,et al.  Cancer Genome Interpreter annotates the biological and clinical relevance of tumor alterations , 2017, Genome Medicine.

[13]  Annette Lee,et al.  Mutational spectrum in a worldwide study of 29,700 families with BRCA1 or BRCA2 mutations , 2018, Human mutation.

[14]  T. Rebbeck,et al.  Modifiers of cancer risk in BRCA1 and BRCA2 mutation carriers: systematic review and meta-analysis. , 2014, Journal of the National Cancer Institute.

[15]  P. Stenson,et al.  The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies , 2017, Human Genetics.

[16]  Gang Feng,et al.  Disease Ontology: a backbone for disease semantic integration , 2011, Nucleic Acids Res..

[17]  S. Hirohashi,et al.  Oncogenic mutation of PIK3CA in small cell lung carcinoma: a potential therapeutic target pathway for chemotherapy-resistant lung cancer. , 2009, Cancer letters.

[18]  Shih-Hsun Chen,et al.  Oncogenic BRAF Deletions That Function as Homodimers and Are Sensitive to Inhibition by RAF Dimer Inhibitor LY3009120. , 2016, Cancer discovery.

[19]  Karsten M. Borgwardt,et al.  The Evaluation of Tools Used to Predict the Impact of Missense Variants Is Hindered by Two Types of Circularity , 2015, Human mutation.

[20]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[21]  W. Miller,et al.  PhenCode: connecting ENCODE data with mutations and phenotype , 2007, Human mutation.

[22]  W. Hahn,et al.  Prospective enterprise-level molecular genotyping of a cohort of cancer patients. , 2014, The Journal of molecular diagnostics : JMD.

[23]  Steven J. M. Jones,et al.  CIViC is a community knowledgebase for expert crowdsourcing the clinical interpretation of variants in cancer , 2017, Nature Genetics.

[24]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[25]  S. Hirota,et al.  Gain-of-function mutations of c-kit in human gastrointestinal stromal tumors. , 1998, Science.

[26]  A. Jemal,et al.  Cancer statistics, 2018 , 2018, CA: a cancer journal for clinicians.

[27]  Robert D. Finn,et al.  The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..

[28]  David L. Masica,et al.  Assessing the Pathogenicity of Insertion and Deletion Variants with the Variant Effect Scoring Tool (VEST‐Indel) , 2015, Human mutation.

[29]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[30]  T. Hubbard,et al.  A census of human cancer genes , 2004, Nature Reviews Cancer.

[31]  Jerven T. Bolleman,et al.  Genetic Variations and Diseases in UniProtKB/Swiss-Prot: The Ins and Outs of Expert Manual Curation , 2014, Human mutation.

[32]  Mauno Vihinen,et al.  VariSNP, A Benchmark Database for Variations From dbSNP , 2015, Human mutation.

[33]  Xinghua Shi,et al.  Effects of short indels on protein structure and function in human genomes , 2017, Scientific Reports.

[34]  Mads Thomassen,et al.  Association of type and location of BRCA1 and BRCA2 mutations with risk of breast and ovarian cancer. , 2015, JAMA.

[35]  Ricardo Villamarín-Salomón,et al.  ClinVar: public archive of interpretations of clinically relevant variants , 2015, Nucleic Acids Res..

[36]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[37]  M. Stratton,et al.  The cancer genome , 2009, Nature.

[38]  Timothy B. Stockwell,et al.  Genetic Variation in an Individual Human Exome , 2008, PLoS genetics.

[39]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[40]  Yunlong Liu,et al.  DDIG-in: detecting disease-causing genetic variations due to frameshifting indels and nonsense mutations employing sequence and structural properties at nucleotide and protein levels , 2015, Bioinform..

[41]  Raymond Dalgleish,et al.  HGVS Recommendations for the Description of Sequence Variants: 2016 Update , 2016, Human mutation.

[42]  Michael Krawczak,et al.  Microdeletions and microinsertions causing human genetic disease: common mechanisms of mutagenesis and the role of local DNA sequence complexity , 2005, Human mutation.

[43]  Feng-Chi Chen,et al.  Human-specific insertions and deletions inferred from mammalian genome sequences. , 2006, Genome research.

[44]  P. Ng,et al.  Predicting the effects of frameshifting indels , 2012, Genome Biology.

[45]  Joshua F. McMichael,et al.  DoCM: a database of curated mutations in cancer , 2016, Nature Methods.

[46]  M. King,et al.  Breast and Ovarian Cancer Risks Due to Inherited Mutations in BRCA1 and BRCA2 , 2003, Science.

[47]  Eliot Y. Zhu,et al.  Cancer Driver Log (CanDL): Catalog of Potentially Actionable Cancer Mutations. , 2015, The Journal of molecular diagnostics : JMD.

[48]  Gustavo Glusman,et al.  A comparison of the human and chimpanzee olfactory receptor gene repertoires. , 2005, Genome research.

[49]  Moriah H Nissan,et al.  OncoKB: A Precision Oncology Knowledge Base. , 2017, JCO precision oncology.