Global Analysis of Human mRNA FOlding Disruptions in Synonymous Variants Demonstrates Significant Population Constraint

Background In most organisms the structure of an mRNA molecule is a crucial determinant of its speed of translation, half-life, splicing propensities and final configuration as a protein. Synonymous mutations which distort this wildtype mRNA structure may be pathogenic as a consequence. However, current clinical guidelines classify synonymous or “silent” single nucleotide variants (sSNVs) as largely benign unless a role in RNA splicing can be demonstrated. Results We developed novel software to conduct a global transcriptome study in which RNA folding statistics were computed for 469 million SNVs in 45,800 transcripts using an Apache Spark implementation of the ViennaRNA software package in the cloud. Focusing our analysis on the subset of 17.9 million sSNVs we discover that variants predicted to disrupt mRNA structure have lower rates of incidence in the human population. Given that the community lacks tools to evaluate the potential pathogenic impact of sSNVs, we introduce a “Structural Predictivity Index” (SPI) to quantify this constraint due to mRNA structure. Conclusion Our findings support the hypothesis that sSNVs may play a role in human genetic diseases due to their effects on mRNA structure. The SPI score and our computed Vienna metrics provide a means of gauging the structural constraint operating on any sSNV. Given that up to 75% of patients with a suspected rare genetic disease lack a molecular diagnosis, our score has the potential to enable discovery of novel etiologies in human genetic disease. Our RNA Stability Pipeline as well as Vienna structural metrics and SPI scores for all human synonymous SNPs can be downloaded from GitHub https://github.com/nch-igm/rna-stability.

[1]  R. Backofen,et al.  A pan-cancer analysis of synonymous mutations , 2019, Nature Communications.

[2]  Hani S. Zaher,et al.  Short translational ramp determines efficiency of protein synthesis , 2019, bioRxiv.

[3]  Yi Zhang,et al.  Performance evaluation of pathogenicity-computation methods for missense variants , 2018, Nucleic acids research.

[4]  Eric L Van Nostrand,et al.  Sequence, Structure and Context Preferences of Human RNA Binding Proteins , 2017, bioRxiv.

[5]  M. Alfadhel,et al.  Whole-genome sequencing offers additional but limited clinical utility compared with reanalysis of whole-exome sequencing , 2018, Genetics in Medicine.

[6]  David R. FitzPatrick,et al.  Paediatric genomics: diagnosing rare disease in children , 2018, Nature Reviews Genetics.

[7]  J. Coller,et al.  Codon optimality, bias and usage in translation and mRNA decay , 2017, Nature Reviews Molecular Cell Biology.

[8]  E. Worthey Analysis and Annotation of Whole‐Genome or Whole‐Exome Sequencing Derived Variants for Clinical Diagnosis , 2017, Current protocols in human genetics.

[9]  Zhong Ren,et al.  Annotating pathogenic non-coding variants in genic regions , 2017, Nature Communications.

[10]  V. Grinev,et al.  The determinants of alternative RNA splicing in human cells , 2017, Molecular Genetics and Genomics.

[11]  M. Carmo-Fonseca,et al.  Deep intronic mutations and human disease , 2017, Human Genetics.

[12]  Briana Vecchio-Pagan,et al.  Systematic Computational Identification of Variants That Activate Exonic and Intronic Cryptic Splice Sites. , 2017, American journal of human genetics.

[13]  Avni Santani,et al.  Development and Validation of Clinical Whole-Exome and Whole-Genome Sequencing for Detection of Germline Variants in Inherited Disease. , 2017, Archives of pathology & laboratory medicine.

[14]  Christina B. McCarthy,et al.  Bicodon bias can determine the role of synonymous SNPs in human diseases , 2017, BMC Genomics.

[15]  L. Hurst,et al.  Both Maintenance and Avoidance of RNA-Binding Protein Interactions Constrain Coding Sequence Evolution , 2017, Molecular biology and evolution.

[16]  Joan,et al.  Prevalence and architecture of de novo mutations in developmental disorders , 2017, Nature.

[17]  Deciphering Developmental Disorders Study,et al.  Prevalence and architecture of de novo mutations in developmental disorders , 2017, Nature.

[18]  A. Komar,et al.  The importance of mRNA structure in determining the pathogenicity of synonymous and non‐synonymous mutations in haemophilia , 2017, Haemophilia : the official journal of the World Federation of Hemophilia.

[19]  A. Siepel,et al.  Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data , 2016, Nature Genetics.

[20]  Alexander E Vinogradov,et al.  DNA helix: the importance of being AT-rich , 2017, Mammalian Genome.

[21]  A. Komar,et al.  Single synonymous mutation in factor IX alters protein properties and underlies haemophilia B , 2016, Journal of Medical Genetics.

[22]  Zhao Su,et al.  Genome-Wide Analysis of RNA Secondary Structure. , 2016, Annual review of genetics.

[23]  Trevor Hastie,et al.  REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants. , 2016, American journal of human genetics.

[24]  Antonio J Giraldez,et al.  Codon identity regulates mRNA stability and translation efficiency during the maternal‐to‐zygotic transition , 2016, The EMBO journal.

[25]  Aleksey Y. Ogurtsov,et al.  Role of mRNA structure in the control of protein folding , 2016, Nucleic acids research.

[26]  R. Fåhraeus,et al.  Whisper mutations: cryptic messages within the genetic code , 2016, Oncogene.

[27]  Junfeng Xia,et al.  dbDSM: a manually curated database for deleterious synonymous mutations , 2016, Bioinform..

[28]  F. Cunningham,et al.  The Ensembl Variant Effect Predictor , 2016, Genome Biology.

[29]  Jorge Amigo,et al.  SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data , 2016, PloS one.

[30]  Rick Tearle,et al.  Whole Genome Sequencing Increases Molecular Diagnostic Yield Compared with Current Diagnostic Testing for Inherited Retinal Disease , 2016, Ophthalmology.

[31]  E. Boerwinkle,et al.  dbNSFP v3.0: A One‐Stop Database of Functional Predictions and Annotations for Human Nonsynonymous and Splice‐Site SNVs , 2016, Human mutation.

[32]  T. Frebourg,et al.  Exonic Splicing Mutations Are More Prevalent than Currently Estimated and Can Be Predicted by Using In Silico Tools , 2016, PLoS genetics.

[33]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[34]  Fabian A. Buske,et al.  VariantSpark: population scale clustering of genotype information , 2015, BMC Genomics.

[35]  M. Allsop,et al.  Patient attitudes towards prenatal diagnostic testing for inherited retinal disease , 2015, Prenatal diagnosis.

[36]  A. Bellacosa,et al.  Role of base excision repair in maintaining the genetic and epigenetic integrity of CpG sites. , 2015, DNA repair.

[37]  D. Söll,et al.  Codon Bias as a Means to Fine-Tune Gene Expression. , 2015, Molecular cell.

[38]  Julie L. Chaney,et al.  Roles for Synonymous Codon Usage in Protein Biogenesis. , 2015, Annual review of biophysics.

[39]  Alejandro Sifrim,et al.  Genetic diagnosis of developmental disorders in the DDD study: a scalable analysis of genome-wide research data , 2015, The Lancet.

[40]  N. Bradbury,et al.  Synonymous codon usage affects the expression of wild type and F508del CFTR. , 2015, Journal of molecular biology.

[41]  Nathan Morris,et al.  Codon Optimality Is a Major Determinant of mRNA Stability , 2015, Cell.

[42]  Bale,et al.  Standards and Guidelines for the Interpretation of Sequence Variants: A Joint Consensus Recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology , 2015, Genetics in Medicine.

[43]  Xiaohui Xie,et al.  DANN: a deep learning approach for annotating the pathogenicity of genetic variants , 2015, Bioinform..

[44]  Colin Campbell,et al.  An integrative approach to predicting the functional effects of non-coding and coding sequence variation , 2015, Bioinform..

[45]  Lao Juan,et al.  Development and Validation of a Scale for Measuring Instructors' Attitudes toward Concept-Based or Reform-Oriented Teaching of Introductory Statistics in the Health and Behavioral Sciences , 2010, 1007.3219.

[46]  Magalie S Leduc,et al.  Molecular findings among patients referred for clinical whole-exome sequencing. , 2014, JAMA.

[47]  Marek S. Wiewiórka,et al.  SparkSeq: fast, scalable and cloud-ready tool for the interactive genomic data analysis with nucleotide precision , 2014, Bioinform..

[48]  Jian-Rong Yang,et al.  Codon-by-Codon Modulation of Translational Speed and Accuracy Via mRNA Folding , 2014, PLoS biology.

[49]  Chava Kimchi-Sarfaty,et al.  Exposing synonymous mutations. , 2014, Trends in genetics : TIG.

[50]  S. Schneider,et al.  Making the Bend: DNA Tertiary Structure and Protein-DNA Interactions , 2014, International journal of molecular sciences.

[51]  Tamir Tuller,et al.  Modelling the Efficiency of Codon–tRNA Interactions Based on Codon Usage Bias , 2014, DNA research : an international journal for rapid publication of reports on genes and genomes.

[52]  Jean Hausser,et al.  MicroRNA binding sites in the coding region of mRNAs: Extending the repertoire of post‐transcriptional gene regulation , 2014, BioEssays : news and reviews in molecular, cellular and developmental biology.

[53]  E. Li,et al.  DNA methylation in mammals. , 2014, Cold Spring Harbor perspectives in biology.

[54]  P. Ray,et al.  Genetic, cell biological, and clinical interrogation of the CFTR mutation c.3700 A>G (p.Ile1234Val) informs strategies for future medical intervention , 2014, Genetics in Medicine.

[55]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[56]  A. Wessel,et al.  Transcriptional Defect of an Inherited NKX2-5 Haplotype Comprising a SNP, a Nonsynonymous and a Synonymous Mutation, Associated with Human Congenital Heart Disease , 2013, PloS one.

[57]  Magalie S Leduc,et al.  Clinical whole-exome sequencing for the diagnosis of mendelian disorders. , 2013, The New England journal of medicine.

[58]  Sadis Matalon,et al.  The silent codon change I507‐ATC→ATT contributes to the severity of the ΔF508 CFTR channel dysfunction , 2013, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[59]  Peter F. Stadler,et al.  The RNAsnp web server: predicting SNP effects on local RNA secondary structure , 2013, Nucleic Acids Res..

[60]  H. Qu,et al.  Human Coding Synonymous Single Nucleotide Polymorphisms at Ramp Regions of mRNA Translation , 2013, PloS one.

[61]  S. Shabalina,et al.  Sounds of silence: synonymous nucleotides as a key to biological regulation and complexity , 2013, Nucleic acids research.

[62]  M. Gottesman,et al.  Sensitive measurement of single-nucleotide polymorphism-induced changes of RNA conformation: application to disease studies , 2012, Nucleic acids research.

[63]  Ilan Gronau,et al.  Inference of natural selection from interspersed genomic elements based on polymorphism and divergence. , 2011, Molecular biology and evolution.

[64]  E. Worthey,et al.  Analysis and annotation of whole-genome or whole-exome sequencing-derived variants for clinical diagnosis. , 2013, Current protocols in human genetics.

[65]  Robert Tibshirani,et al.  Genome-wide measurement of RNA folding energies. , 2012, Molecular cell.

[66]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[67]  Pablo Cingolani,et al.  © 2012 Landes Bioscience. Do not distribute. , 2022 .

[68]  K. Khabar,et al.  UU/UA Dinucleotide Frequency Reduction in Coding Regions Results in Increased mRNA Stability and Protein Expression , 2012, Molecular therapy : the journal of the American Society of Gene Therapy.

[69]  R. Chatterjee,et al.  CG methylation. , 2012, Epigenomics.

[70]  Kong-Peng Lam,et al.  Integrative analysis workflow for the structural and functional classification of C-type lectins , 2011, BMC Bioinformatics.

[71]  Shandar Ahmad,et al.  Prediction of dinucleotide-specific RNA-binding sites in proteins , 2011, BMC Bioinformatics.

[72]  Peter F. Stadler,et al.  ViennaRNA Package 2.0 , 2011, Algorithms for Molecular Biology.

[73]  C. Kimchi-Sarfaty,et al.  Understanding the contribution of synonymous mutations to human disease , 2011, Nature Reviews Genetics.

[74]  Amos Tanay,et al.  Primate CpG Islands Are Maintained by Heterogeneous Evolutionary Regimes Involving Minimal Selection , 2011, Cell.

[75]  J. Plotkin,et al.  Synonymous but not the same: the causes and consequences of codon bias , 2011, Nature Reviews Genetics.

[76]  Serafim Batzoglou,et al.  Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++ , 2010, PLoS Comput. Biol..

[77]  J. Kappes,et al.  A Synonymous Single Nucleotide Polymorphism in ΔF508 CFTR Alters the Secondary Structure of the mRNA and the Expression of the Mutant Protein* , 2010, The Journal of Biological Chemistry.

[78]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[79]  Y. Pilpel,et al.  An Evolutionarily Conserved Mechanism for Controlling the Efficiency of Protein Translation , 2010, Cell.

[80]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[81]  Tong Zhou,et al.  A Universal Trend of Reduced mRNA Stability near the Translation-Initiation Site in Prokaryotes and Eukaryotes , 2010, PLoS Comput. Biol..

[82]  David H. Mathews,et al.  NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure , 2009, Nucleic Acids Res..

[83]  Xiaohui Xie,et al.  Identifying novel constrained elements by exploiting biased substitution patterns , 2009, Bioinform..

[84]  David Tollervey,et al.  Coding-Sequence Determinants of Gene Expression in Escherichia coli , 2009, Science.

[85]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[86]  S. Silverman A forced march across an RNA folding landscape. , 2008, Chemistry & biology.

[87]  A. Drohat,et al.  Excision of 5-Halogenated Uracils by Human Thymine DNA Glycosylase , 2007, Journal of Biological Chemistry.

[88]  K. Ray,et al.  Evaluation of the OPTC gene in primary open angle glaucoma: functional significance of a silent change , 2007, BMC Molecular Biology.

[89]  K. Shokat,et al.  Human Catechol-O-Methyltransferase Haplotypes Modulate Protein Expression by Altering mRNA Secondary Structure , 2006, Science.

[90]  L. Hurst,et al.  Evidence for selection on synonymous mutations affecting stability of mRNA secondary structure in mammals , 2005, Genome Biology.

[91]  E. Rocha Codon usage bias from tRNA's point of view: redundancy, specialization, and efficient decoding for translation optimization. , 2004, Genome research.

[92]  J. Duan,et al.  Mammalian Mutation Pressure, Synonymous Codon Choice, and mRNA Degradation , 2003, Journal of Molecular Evolution.

[93]  C. Burge,et al.  Widespread selection for local RNA secondary structure in coding regions of bacterial genes. , 2003, Genome research.

[94]  N. Saitou,et al.  Synonymous mutations in the human dopamine receptor D2 (DRD2) affect mRNA stability and synthesis of the receptor. , 2003, Human molecular genetics.

[95]  T Gojobori,et al.  Codon and base biases after the initiation codon of the open reading frames in the Escherichia coli genome and their influence on the translation efficiency. , 2001, Journal of biochemistry.

[96]  M. Kreitman,et al.  Coding sequence evolution. , 1999, Current opinion in genetics & development.

[97]  A. Krogh,et al.  No evidence that mRNAs have lower folding free energies than random sequences with the same dinucleotide distribution. , 1999, Nucleic acids research.

[98]  J. Sabina,et al.  Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. , 1999, Journal of molecular biology.

[99]  David W. Digby,et al.  mRNAs have greater negative folding free energies than shuffled or codon choice randomized sequences. , 1999, Nucleic acids research.

[100]  C. Kurland,et al.  Co-variation of tRNA abundance and codon usage in Escherichia coli at different growth rates. , 1996, Journal of molecular biology.

[101]  J. McCaskill The equilibrium partition function and base pair binding probabilities for RNA secondary structure , 1990, Biopolymers.

[102]  Tord Høivik,et al.  A program , 1971 .