regSNPs-splicing: a tool for prioritizing synonymous single-nucleotide substitution

While synonymous single-nucleotide variants (sSNVs) have largely been unstudied, since they do not alter protein sequence, mounting evidence suggests that they may affect RNA conformation, splicing, and the stability of nascent-mRNAs to promote various diseases. Accurately prioritizing deleterious sSNVs from a pool of neutral ones can significantly improve our ability of selecting functional genetic variants identified from various genome-sequencing projects, and, therefore, advance our understanding of disease etiology. In this study, we develop a computational algorithm to prioritize sSNVs based on their impact on mRNA splicing and protein function. In addition to genomic features that potentially affect splicing regulation, our proposed algorithm also includes dozens structural features that characterize the functions of alternatively spliced exons on protein function. Our systematical evaluation on thousands of sSNVs suggests that several structural features, including intrinsic disorder protein scores, solvent accessible surface areas, protein secondary structures, and known and predicted protein family domains, show significant differences between disease-causing and neutral sSNVs. Our result suggests that the protein structure features offer an added dimension of information while distinguishing disease-causing and neutral synonymous variants. The inclusion of structural features increases the predictive accuracy for functional sSNV prioritization.

[1]  P. Radivojac,et al.  MutPred Splice: machine learning-based prediction of exonic variants that disrupt splicing , 2014, Genome Biology.

[2]  D. Black,et al.  Alternative pre-mRNA splicing in neurons: growing up and extending its reach. , 2013, Trends in genetics : TIG.

[3]  Hsien-Da Huang,et al.  dbPTM 3.0: an informative resource for investigating substrate site specificity and functional association of protein post-translational modifications , 2012, Nucleic Acids Res..

[4]  Manolis Kellis,et al.  Interpreting noncoding genetic variation in complex traits and human disease , 2012, Nature Biotechnology.

[5]  M. Gelfand,et al.  Computational analysis of splicing errors and mutations in human transcripts , 2008, BMC Genomics.

[6]  Qiangfeng Cliff Zhang,et al.  Landscape and variation of RNA secondary structure across the human transcriptome , 2014, Nature.

[7]  R. Amann,et al.  Predictive Identification of Exonic Splicing Enhancers in Human Genes , 2022 .

[8]  Yaoqi Zhou,et al.  Impact of human pathogenic micro-insertions and micro-deletions on post-transcriptional regulation. , 2014, Human molecular genetics.

[9]  Yuedong Yang,et al.  Predicting continuous local structure and the effect of its substitution for secondary structure in fragment-free protein structure prediction. , 2009, Structure.

[10]  P. Stenson,et al.  The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine , 2013, Human Genetics.

[11]  Emily K. Tsang,et al.  Effect of predicted protein-truncating genetic variants on the human transcriptome , 2015, Science.

[12]  S. Ellard,et al.  The contribution of rapid KATP channel gene mutation analysis to the clinical management of children with congenital hyperinsulinism. , 2011, European journal of endocrinology.

[13]  L. Chasin,et al.  Computational definition of sequence motifs governing constitutive exon splicing. , 2004, Genes & development.

[14]  D. Baralle,et al.  Splicing in action: assessing disease causing sequence changes , 2005, Journal of Medical Genetics.

[15]  M. Tomita,et al.  Computational comparative analyses of alternative splicing regulation using full-length cDNA of various eukaryotes. , 2004, RNA.

[16]  Yuedong Yang,et al.  DDIG-in: discriminating between disease-associated and neutral non-frameshifting micro-indels , 2013, Genome Biology.

[17]  Christian Gilissen,et al.  Next-generation genetic testing for retinitis pigmentosa , 2012, Human mutation.

[18]  C. Kimchi-Sarfaty,et al.  Understanding the contribution of synonymous mutations to human disease , 2011, Nature Reviews Genetics.

[19]  D. Macaya,et al.  A synonymous mutation in TCOF1 causes Treacher Collins syndrome due to mis‐splicing of a constitutive exon , 2009, American Journal of Medical Genetics. Part A.

[20]  J. R. McMillan,et al.  DNA-based prenatal diagnosis of harlequin ichthyosis and characterization of ABCA12 mutation consequences. , 2007, The Journal of investigative dermatology.

[21]  N. Saitou,et al.  Synonymous mutations in the human dopamine receptor D2 (DRD2) affect mRNA stability and synthesis of the receptor. , 2003, Human molecular genetics.

[22]  J. Mullikin,et al.  Genomic features defining exonic variants that modulate splicing , 2010, Genome Biology.

[23]  Ian H. Witten,et al.  Chapter 10 – Deep learning , 2017 .

[24]  A. Krainer,et al.  Listening to silence and understanding nonsense: exonic mutations that affect splicing , 2002, Nature Reviews Genetics.

[25]  M. Raponi,et al.  Synonymous mutations in CFTR exon 12 affect splicing and are not neutral in evolution. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Sarah A. Teichmann,et al.  Relative Solvent Accessible Surface Area Predicts Protein Conformational Changes upon Binding , 2011, Structure.

[27]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[28]  L. Hurst,et al.  Evidence for selection on synonymous mutations affecting stability of mRNA secondary structure in mammals , 2005, Genome Biology.

[29]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[30]  David A. Knowles,et al.  RNA splicing is a primary link between genetic variation and disease , 2016, Science.

[31]  Julian R. E. Davis,et al.  The role of the aryl hydrocarbon receptor-interacting protein gene in familial and sporadic pituitary adenomas. , 2008, The Journal of clinical endocrinology and metabolism.

[32]  B. Frey,et al.  The human splicing code reveals new insights into the genetic determinants of disease , 2015, Science.

[33]  Brendan J. Frey,et al.  Deciphering the splicing code , 2010, Nature.

[34]  T. Cooper,et al.  Pre-mRNA splicing and human disease. , 2003, Genes & development.

[35]  Peter F. Stadler,et al.  ViennaRNA Package 2.0 , 2011, Algorithms for Molecular Biology.

[36]  Yunlong Liu,et al.  DDIG-in: detecting disease-causing genetic variations due to frameshifting indels and nonsense mutations employing sequence and structural properties at nucleotide and protein levels , 2015, Bioinform..

[37]  Kate B. Cook,et al.  RBPDB: a database of RNA-binding specificities , 2010, Nucleic Acids Res..

[38]  S. Stamm,et al.  Function of Alternative Splicing , 2004 .

[39]  T. Cooper,et al.  The pathobiology of splicing , 2010, The Journal of pathology.

[40]  Mingxiang Teng,et al.  Prioritizing single-nucleotide variations that potentially regulate alternative splicing , 2011, BMC proceedings.

[41]  Brendan J. Frey,et al.  A compendium of RNA-binding motifs for decoding gene regulation , 2013, Nature.

[42]  Alexandra J. Scott,et al.  Functional analysis of synonymous substitutions predicted to affect splicing of the CFTR gene. , 2012, Journal of cystic fibrosis : official journal of the European Cystic Fibrosis Society.

[43]  J. Valcárcel,et al.  Synonymous Mutations Frequently Act as Driver Mutations in Human Cancers , 2014, Cell.

[44]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[45]  A Keith Dunker,et al.  SPINE-D: Accurate Prediction of Short and Long Disordered Regions by a Single Neural-Network Based Method , 2012, Journal of biomolecular structure & dynamics.

[46]  K. Koehler,et al.  Triple A syndrome: 32 years experience of a single centre (1977–2008) , 2010, European Journal of Pediatrics.