Comparison and integration of computational methods for deleterious synonymous mutation prediction

Synonymous mutations do not change the encoded amino acids but may alter the structure or function of an mRNA in ways that impact gene function. Advances in next generation sequencing technologies have detected numerous synonymous mutations in the human genome. Several computational models have been proposed to predict deleterious synonymous mutations, which have greatly facilitated the development of this important field. Consequently, there is an urgent need to assess the state-of-the-art computational methods for deleterious synonymous mutation prediction to further advance the existing methodologies and to improve performance. In this regard, we systematically compared a total of 10 computational methods (including specific method for deleterious synonymous mutation and general method for single nucleotide mutation) in terms of the algorithms used, calculated features, performance evaluation and software usability. In addition, we constructed two carefully curated independent test datasets and accordingly assessed the robustness and scalability of these different computational methods for the identification of deleterious synonymous mutations. In an effort to improve predictive performance, we established an ensemble model, named Prediction of Deleterious Synonymous Mutation (PrDSM), which averages the ratings generated by the three most accurate predictors. Our benchmark tests demonstrated that the ensemble model PrDSM outperformed the reviewed tools for the prediction of deleterious synonymous mutations. Using the ensemble model, we developed an accessible online predictor, PrDSM, available at http://bioinfo.ahu.edu.cn:8080/PrDSM/. We hope that this comprehensive survey and the proposed strategy for building more accurate models can serve as a useful guide for inspiring future developments of computational methods for deleterious synonymous mutation prediction.

[1]  J. Valcárcel,et al.  Synonymous Mutations Frequently Act as Driver Mutations in Human Cancers , 2014, Cell.

[2]  Piero Fariselli,et al.  PhD-SNPg: a webserver and lightweight tool for scoring single nucleotide variants , 2017, Nucleic Acids Res..

[3]  J. G. Patton,et al.  Splicing fidelity, enhancers, and disease. , 2008, Frontiers in bioscience : a journal and virtual library.

[4]  Chava Kimchi-Sarfaty,et al.  Exposing synonymous mutations. , 2014, Trends in genetics : TIG.

[5]  Michael B Atkins,et al.  Single nucleotide polymorphisms and risk of recurrence of renal-cell carcinoma: a cohort study. , 2013, The Lancet. Oncology.

[6]  Mauno Vihinen,et al.  VariSNP, A Benchmark Database for Variations From dbSNP , 2015, Human mutation.

[7]  Zixiang Wang,et al.  Computational identification of binding energy hot spots in protein–RNA complexes using an ensemble approach , 2018, Bioinform..

[8]  Ying Yang,et al.  Syntool: A Novel Region-Based Intolerance Score to Single Nucleotide Substitution for Synonymous Mutations Predictions Based on 123,136 Individuals , 2017, BioMed research international.

[9]  Nicholas H. Barton,et al.  The Relative Rates of Evolution of Sex Chromosomes and Autosomes , 1987, The American Naturalist.

[10]  Benjamin J. Raphael,et al.  Mutational landscape and significance across 12 major cancer types , 2013, Nature.

[11]  Jaroslav Bendl,et al.  PredictSNP2: A Unified Platform for Accurately Evaluating SNP Effects by Exploiting the Different Characteristics of Variants in Distinct Genomic Regions , 2016, PLoS Comput. Biol..

[12]  Sharon E. Plon,et al.  Evaluation of in silico algorithms for use with ACMG/AMP clinical variant interpretation guidelines , 2017, Genome Biology.

[13]  Anne-Laure Boulesteix,et al.  Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics , 2012, WIREs Data Mining Knowl. Discov..

[14]  Valer Gotea,et al.  The functional relevance of somatic synonymous mutations in melanoma and other cancers , 2015, Pigment cell & melanoma research.

[15]  D. Jordan,et al.  Large Numbers of Genetic Variants Considered to be Pathogenic are Common in Asymptomatic Individuals , 2013, Human mutation.

[16]  Annick Harel-Bellan,et al.  A synonymous variant in IRGM alters a binding site for miR-196 and causes deregulation of IRGM-dependent xenophagy in Crohn's disease , 2011, Nature Genetics.

[17]  Xiaohui Xie,et al.  DANN: a deep learning approach for annotating the pathogenicity of genetic variants , 2015, Bioinform..

[18]  E. Zeggini,et al.  Functional annotation of non-coding sequence variants , 2014, Nature Methods.

[19]  Yuedong Yang,et al.  Investigating DNA‐, RNA‐, and protein‐based features as a means to discriminate pathogenic synonymous variants , 2017, Human mutation.

[20]  C. Kimchi-Sarfaty,et al.  Understanding the contribution of synonymous mutations to human disease , 2011, Nature Reviews Genetics.

[21]  Zhong Ren,et al.  Annotating pathogenic non-coding variants in genic regions , 2017, Nature Communications.

[22]  Philip L. De Jager,et al.  RARE, SYNONYMOUS VARIANTS IN CDH23, SLC9A3R1, RHBDD2 AND ITIH2 ARE ASSOCIATED WITH ALZHEIMER’S DISEASE IN MULTIPLEX CARIBBEAN HISPANIC FAMILIES , 2018, Alzheimer's & Dementia.

[23]  R. Altman,et al.  Collective judgment predicts disease-associated single nucleotide variants , 2013, BMC Genomics.

[24]  BoulesteixAnne-Laure,et al.  Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics , 2012 .

[25]  Gang Fu,et al.  Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data , 2014, Nucleic Acids Res..

[26]  Michael Brudno,et al.  Identification of deleterious synonymous variants in human genomes , 2013, Bioinform..

[27]  Mauno Vihinen,et al.  PON‐P: Integrated predictor for pathogenicity of missense variants , 2012, Human mutation.

[28]  Yuedong Yang,et al.  regSNPs-splicing: a tool for prioritizing synonymous single-nucleotide substitution , 2017, Human Genetics.

[29]  Siyuan Zheng,et al.  Silent Mutations Make Some Noise , 2014, Cell.

[30]  Gholamreza Haffari,et al.  Twenty years of bioinformatics research for protease-specific substrate and cleavage site prediction: a comprehensive revisit and benchmarking of existing methods , 2018, Briefings Bioinform..

[31]  Iuliana Ionita-Laza,et al.  De Novo Synonymous Mutations in Regulatory Elements Contribute to the Genetic Etiology of Autism and Schizophrenia , 2016, Neuron.

[32]  Alastair Forbes,et al.  Sequence variants in the autophagy gene IRGM and multiple other replicating loci contribute to Crohn's disease susceptibility , 2007, Nature Genetics.

[33]  B. Frey,et al.  The human splicing code reveals new insights into the genetic determinants of disease , 2015, Science.

[34]  Jiangning Song,et al.  Toward more accurate prediction of caspase cleavage sites: a comprehensive review of current methods, tools and features , 2018, Briefings Bioinform..

[35]  Tom R. Gaunt,et al.  FATHMM-XF: accurate prediction of pathogenic point mutations via extended features , 2017, Bioinform..

[36]  B. Mohapatra,et al.  Implication of GATA4 synonymous variants in congenital heart disease: A comprehensive in-silico approach. , 2019, Mutation research.

[37]  Matthew S. Lebo,et al.  A systematic approach to the reporting of medically relevant findings from whole genome sequencing , 2014, BMC Medical Genetics.

[38]  Junfeng Xia,et al.  Computational identification of deleterious synonymous variants in human genomes using a feature-based approach , 2019, BMC Medical Genomics.

[39]  Sven Diederichs,et al.  The dark matter of the cancer genome: aberrations in regulatory elements, untranslated regions, splice sites, non‐coding RNA and synonymous mutations , 2016, EMBO molecular medicine.

[40]  Serafim Batzoglou,et al.  Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++ , 2010, PLoS Comput. Biol..

[41]  A. Krainer,et al.  Listening to silence and understanding nonsense: exonic mutations that affect splicing , 2002, Nature Reviews Genetics.

[42]  A. Butte,et al.  Non-Synonymous and Synonymous Coding SNPs Show Similar Likelihood and Effect Size of Human Disease Association , 2010, PloS one.

[43]  L. Hurst,et al.  Hearing silence: non-neutral evolution at synonymous sites in mammals , 2006, Nature Reviews Genetics.

[44]  P. Stenson,et al.  The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies , 2017, Human Genetics.

[45]  Gregory M. Cooper,et al.  CADD: predicting the deleteriousness of variants throughout the human genome , 2018, Nucleic Acids Res..

[46]  Yi Zhang,et al.  Performance evaluation of pathogenicity-computation methods for missense variants , 2018, Nucleic acids research.

[47]  Colin Campbell,et al.  An integrative approach to predicting the functional effects of non-coding and coding sequence variation , 2015, Bioinform..