dbCPM: a manually curated database for exploring the cancer passenger mutations

While recently emergent driver mutation data sets are available for developing computational methods to predict cancer mutation effects, benchmark sets focusing on passenger mutations are largely missing. Here, we developed a comprehensive literature-based database of Cancer Passenger Mutations (dbCPM), which contains 941 experimentally supported and 978 putative passenger mutations derived from a manual curation of the literature. Using the missense mutation data, the largest group in the dbCPM, we explored patterns of missense passenger mutations by comparing them with the missense driver mutations and assessed the performance of four cancer-focused mutation effect predictors. We found that the missense passenger mutations showed significant differences with drivers at multiple levels, and several appeared in both the passenger and driver categories, showing pleiotropic functions depending on the tumor context. Although all the predictors displayed good true positive rates, their true negative rates were relatively low due to the lack of negative training samples with experimental evidence, which suggests that a suitable negative data set for developing a more robust methodology is needed. We hope that the dbCPM will be a benchmark data set for improving and evaluating prediction algorithms and serve as a valuable resource for the cancer research community. dbCPM is freely available online at http://bioinfo.ahu.edu.cn:8080/dbCPM.

[1]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[2]  T. Hubbard,et al.  A census of human cancer genes , 2004, Nature Reviews Cancer.

[3]  Zechen Chong,et al.  TransVar: a multilevel variant annotator for precision genomics , 2015, Nature Methods.

[4]  M. King,et al.  Breast and Ovarian Cancer Risks Due to Inherited Mutations in BRCA1 and BRCA2 , 2003, Science.

[5]  Gang Feng,et al.  Disease Ontology: a backbone for disease semantic integration , 2011, Nucleic Acids Res..

[6]  Steven J. M. Jones,et al.  Comprehensive Characterization of Cancer Driver Genes and Mutations , 2018, Cell.

[7]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[8]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[9]  W. Miller,et al.  PhenCode: connecting ENCODE data with mutations and phenotype , 2007, Human mutation.

[10]  Tugba G. Kucukkal,et al.  Structural and physico-chemical effects of disease and non-disease nsSNPs on proteins. , 2015, Current opinion in structural biology.

[11]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[12]  M. Stratton,et al.  The cancer genome , 2009, Nature.

[13]  Ramaswamy K. Iyer,et al.  Germline Variation in Cancer-Susceptibility Genes in a Healthy, Ancestrally Diverse Cohort: Implications for Individual Genome Sequencing , 2014, PloS one.

[14]  Joshua F. McMichael,et al.  DoCM: a database of curated mutations in cancer , 2016, Nature Methods.

[15]  Ken Chen,et al.  Systematic Functional Annotation of Somatic Mutations in Cancer. , 2018, Cancer cell.

[16]  Alessandra Viel,et al.  Assessment of pathogenicity criteria for constitutional missense mutations of the hereditary nonpolyposis colorectal cancer genes MLH1 and MSH2 , 1999, European Journal of Human Genetics.

[17]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[18]  C. Orengo,et al.  Landscape of activating cancer mutations in FGFR kinases and their differential responses to inhibitors in clinical use , 2016, Oncotarget.

[19]  A. Jemal,et al.  Cancer statistics, 2018 , 2018, CA: a cancer journal for clinicians.

[20]  James G. Lyons,et al.  Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning , 2015, Scientific Reports.

[21]  Robert D. Finn,et al.  The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..

[22]  Johannes H. Hegemann,et al.  Missense variants in hMLH1 identified in patients from the German HNPCC consortium and functional studies , 2011, Familial Cancer.

[23]  Eliot Y. Zhu,et al.  Cancer Driver Log (CanDL): Catalog of Potentially Actionable Cancer Mutations. , 2015, The Journal of molecular diagnostics : JMD.

[24]  Tom R. Gaunt,et al.  Predicting the Functional, Molecular, and Phenotypic Consequences of Amino Acid Substitutions using Hidden Markov Models , 2012, Human mutation.

[25]  Walter F. Bodmer,et al.  Genotyping Possible Polymorphic Variants of Human Mismatch Repair Genes in Healthy Korean Individuals and Sporadic Colorectal Cancer Patients , 2004, Familial Cancer.

[26]  Ricardo Villamarín-Salomón,et al.  ClinVar: public archive of interpretations of clinically relevant variants , 2015, Nucleic Acids Res..

[27]  Avni Santani,et al.  Actionable exomic incidental findings in 6503 participants: challenges of variant classification , 2015, Genome research.

[28]  Mikio Kan,et al.  Constitutive activating mutation of the FGFR3b in oral squamous cell carcinomas , 2005, International journal of cancer.

[29]  Moriah H Nissan,et al.  OncoKB: A Precision Oncology Knowledge Base. , 2017, JCO precision oncology.

[30]  Leyla Isik,et al.  Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations. , 2009, Cancer research.

[31]  Jerven T. Bolleman,et al.  Genetic Variations and Diseases in UniProtKB/Swiss-Prot: The Ins and Outs of Expert Manual Curation , 2014, Human mutation.

[32]  Mauno Vihinen,et al.  VariSNP, A Benchmark Database for Variations From dbSNP , 2015, Human mutation.

[33]  D. Haber,et al.  Cancer: Drivers and passengers , 2007, Nature.

[34]  Charles Swanton,et al.  My Cancer Genome: a unified genomics and clinical trial portal , 2012 .

[35]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[36]  G. Mills,et al.  CanDrA: Cancer-Specific Driver Missense Mutation Annotation with Optimized Features , 2013, PloS one.

[37]  Larry N. Singh,et al.  Secondary variants in individuals undergoing exome sequencing: screening of 572 individuals identifies high-penetrance mutations in cancer-susceptibility genes. , 2012, American journal of human genetics.

[38]  A. Gonzalez-Perez,et al.  Improving the prediction of the functional impact of cancer mutations by baseline tolerance transformation , 2012, Genome Medicine.

[39]  Li Ding,et al.  Activating HER2 mutations in HER2 gene amplification negative breast cancer. , 2013, Cancer discovery.

[40]  C. Sander,et al.  Predicting the functional impact of protein mutations: application to cancer genomics , 2011, Nucleic acids research.

[41]  Raymond Dalgleish,et al.  HGVS Recommendations for the Description of Sequence Variants: 2016 Update , 2016, Human mutation.

[42]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[43]  Qiong He,et al.  The MLH1 2101C>A (Q701K) variant increases the risk of gastric cancer in Chinese males , 2011, BMC gastroenterology.

[44]  M. Vihinen How to evaluate performance of prediction methods? Measures and their interpretation in variation effect analysis , 2012, BMC Genomics.

[45]  Tom R. Gaunt,et al.  CScape: a tool for predicting oncogenic single-point mutations in the cancer genome , 2017, Scientific Reports.