RNA Thermodynamic Structural Entropy

Conformational entropy for atomic-level, three dimensional biomolecules is known experimentally to play an important role in protein-ligand discrimination, yet reliable computation of entropy remains a difficult problem. Here we describe the first two accurate and efficient algorithms to compute the conformational entropy for RNA secondary structures, with respect to the Turner energy model, where free energy parameters are determined from UV absorption experiments. An algorithm to compute the derivational entropy for RNA secondary structures had previously been introduced, using stochastic context free grammars (SCFGs). However, the numerical value of derivational entropy depends heavily on the chosen context free grammar and on the training set used to estimate rule probabilities. Using data from the Rfam database, we determine that both of our thermodynamic methods, which agree in numerical value, are substantially faster than the SCFG method. Thermodynamic structural entropy is much smaller than derivational entropy, and the correlation between length-normalized thermodynamic entropy and derivational entropy is moderately weak to poor. In applications, we plot the structural entropy as a function of temperature for known thermoswitches, such as the repression of heat shock gene expression (ROSE) element, we determine that the correlation between hammerhead ribozyme cleavage activity and total free energy is improved by including an additional free energy term arising from conformational entropy, and we plot the structural entropy of windows of the HIV-1 genome. Our software RNAentropy can compute structural entropy for any user-specified temperature, and supports both the Turner’99 and Turner’04 energy parameters. It follows that RNAentropy is state-of-the-art software to compute RNA secondary structure conformational entropy. Source code is available at https://github.com/clotelab/RNAentropy/; a full web server is available at http://bioinformatics.bc.edu/clotelab/RNAentropy, including source code and ancillary programs.

[1]  Jan Barciszewski,et al.  RNA Biochemistry and Biotechnology , 1999 .

[2]  Quaid Morris,et al.  RNAcontext: A New Method for Learning the Sequence and Structure Binding Preferences of RNA-Binding Proteins , 2010, PLoS Comput. Biol..

[3]  Ye Ding,et al.  Sfold web server for statistical folding and rational design of nucleic acids , 2004, Nucleic Acids Res..

[4]  S. Altschul,et al.  Significance of nucleotide sequence alignments: a method for random sequence permutation that preserves dinucleotide and codon usage. , 1985, Molecular biology and evolution.

[5]  Santosh K. Mishra,et al.  De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures , 2007, Bioinform..

[6]  Mark J. Gibbs,et al.  Sister-Scanning: a Monte Carlo procedure for assessing signals in recombinant sequences , 2000, Bioinform..

[7]  Sean R. Eddy,et al.  Rfam 11.0: 10 years of RNA families , 2012, Nucleic Acids Res..

[8]  A. Joshua Wand,et al.  The role of conformational entropy in molecular recognition by calmodulin , 2010, Nature chemical biology.

[9]  P. Clote,et al.  Combinatorics of locally optimal RNA secondary structures , 2011, Journal of mathematical biology.

[10]  Quaid Morris,et al.  RBPmotif: a web server for the discovery of sequence and structure preferences of RNA-binding proteins , 2013, Nucleic Acids Res..

[11]  P. Clote,et al.  Structural RNA has lower folding energy than random RNA of the same dinucleotide frequency. , 2005, RNA.

[12]  Eckart Bindewald,et al.  CorreLogo: an online server for 3D sequence logos of RNA and DNA alignments , 2006, Nucleic Acids Res..

[13]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[14]  Sean R. Eddy,et al.  Infernal 1.1: 100-fold faster RNA homology searches , 2013, Bioinform..

[15]  K. Sharp,et al.  Calculation of configurational entropy with a Boltzmann-quasiharmonic model: the origin of high-affinity protein-ligand binding. , 2011, The journal of physical chemistry. B.

[16]  Stijn van Dongen,et al.  miRBase: tools for microRNA genomics , 2007, Nucleic Acids Res..

[17]  Peter F. Stadler,et al.  Partition function and base pairing probabilities of RNA heterodimers , 2006, Algorithms for Molecular Biology.

[18]  Niles A. Pierce,et al.  Nucleic acid sequence design via efficient ensemble defect optimization , 2011, J. Comput. Chem..

[19]  Amirhossein Manzourolajdad,et al.  Information-theoretic uncertainty of SCFG-modeled folding space of the non-coding RNA. , 2013, Journal of theoretical biology.

[20]  R. Nussinov,et al.  Fast algorithm for predicting the secondary structure of single-stranded RNA. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[21]  P. Higgs,et al.  Barrier heights between ground states in a model of RNA secondary structure , 1998 .

[22]  Dang D. Long,et al.  mirWIP: microRNA target prediction based on microRNA-containing ribonucleoprotein–enriched transcripts , 2008, Nature Methods.

[23]  Robert M. Dirks,et al.  Paradigms for computational nucleic acid design. , 2004, Nucleic acids research.

[24]  K. Reinert Complete suboptimal folding of RNA and the stability of secondary structures , Biopolymers , 2012 .

[25]  M. Huynen,et al.  Assessing the reliability of RNA folding using statistical mechanics. , 1997, Journal of molecular biology.

[26]  Peter F. Stadler,et al.  tRNAdb 2009: compilation of tRNA sequences and tRNA genes , 2008, Nucleic Acids Res..

[27]  Sean R. Eddy,et al.  Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction , 2004, BMC Bioinformatics.

[28]  Peter Clote,et al.  Asymptotics of RNA Shapes , 2008, J. Comput. Biol..

[29]  Christian N. S. Pedersen,et al.  Characterising RNA secondary structure space using information entropy , 2013, BMC Bioinformatics.

[30]  Enrico Di Cera Thermodynamics in biology , 2000 .

[31]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[32]  P. Higgs RNA secondary structure: physical and computational aspects , 2000, Quarterly Reviews of Biophysics.

[33]  D. Turner,et al.  Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Robert Giegerich,et al.  Prediction and Visualization of Structural Switches in RNA , 1998, Pacific Symposium on Biocomputing.

[35]  Le A. Trinh,et al.  Programmable in situ amplification for multiplexed imaging of mRNA expression , 2010, Nature Biotechnology.

[36]  Bjarne Knudsen,et al.  Multithreaded comparative RNA secondary structure prediction using stochastic context-free grammars , 2011, BMC Bioinformatics.

[37]  S. Eddy,et al.  A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more. , 2012, RNA.

[38]  Michael Zuker,et al.  Mfold web server for nucleic acid folding and hybridization prediction , 2003, Nucleic Acids Res..

[39]  Ye Ding,et al.  A structural analysis of in vitro catalytic activities of hammerhead ribozymes , 2007, BMC Bioinformatics.

[40]  Michael Zuker,et al.  UNAFold: software for nucleic acid folding and hybridization. , 2008, Methods in molecular biology.

[41]  C. Lawrence,et al.  A statistical sampling algorithm for RNA secondary structure prediction. , 2003, Nucleic acids research.

[42]  David H. Mathews,et al.  NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure , 2009, Nucleic Acids Res..

[43]  Robert D. Finn,et al.  Rfam 12.0: updates to the RNA families database , 2014, Nucleic Acids Res..

[44]  Jotun Hein,et al.  Quantifying variances in comparative RNA secondary structure prediction , 2013, BMC Bioinformatics.

[45]  Michael Zuker,et al.  Algorithms and Thermodynamics for RNA Secondary Structure Prediction: A Practical Guide , 1999 .

[46]  Gary D. Stormo,et al.  Displaying the information contents of structural RNA alignments: the structure logos , 1997, Comput. Appl. Biosci..

[47]  D. Mathews Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. , 2004, RNA.

[48]  Irmtraud M. Meyer,et al.  Moments of the Boltzmann distribution for RNA secondary structures , 2005, Bulletin of mathematical biology.

[49]  Peter Clote,et al.  Integrating Chemical Footprinting Data into RNA Secondary Structure Prediction , 2012, PloS one.

[50]  P. Schuster,et al.  Complete suboptimal folding of RNA and the stability of secondary structures. , 1999, Biopolymers.

[51]  M. Zuker,et al.  Prediction of hybridization and melting for double-stranded nucleic acids. , 2004, Biophysical journal.

[52]  M Karplus,et al.  Configurational entropy of native proteins. , 1987, Biophysical journal.

[53]  K. Dill,et al.  Molecular driving forces : statistical thermodynamics in chemistry and biology , 2002 .

[54]  Peter Clote,et al.  Complete RNA inverse folding: computational design of functional hammerhead ribozymes , 2014, Nucleic acids research.

[55]  A. Wand The dark energy of proteins comes to light: conformational entropy and its role in protein function revealed by NMR relaxation. , 2013, Current opinion in structural biology.

[56]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[57]  Elena Rivas,et al.  Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs , 2000, Bioinform..

[58]  Peter F. Stadler,et al.  ViennaRNA Package 2.0 , 2011, Algorithms for Molecular Biology.

[59]  M. Gottesman,et al.  Sensitive measurement of single-nucleotide polymorphism-induced changes of RNA conformation: application to disease studies , 2012, Nucleic acids research.

[60]  M. Summers,et al.  Structural determinants and mechanism of HIV-1 genome packaging. , 2011, Journal of molecular biology.

[61]  J. McCaskill The equilibrium partition function and base pair binding probabilities for RNA secondary structure , 1990, Biopolymers.