Extracting stacking interaction parameters for RNA from the data set of native structures.

A crucial step in the determination of the three-dimensional native structures of RNA is the prediction of their secondary structures, which are stable independent of the tertiary fold. Accurate prediction of the secondary structure requires context-dependent estimates of the interaction parameters. We have exploited the growing database of natively folded RNA structures in the Protein Data Bank (PDB) to obtain stacking interaction parameters using a knowledge-based approach. Remarkably, the calculated values of the resulting statistical potentials (SPs) are in excellent agreement with the parameters determined using measurements in small oligonucleotides. We validate the SPs by predicting 74% of the base-pairs in a dataset of structures using the ViennaRNA package. Interestingly, this number is similar to that obtained using the measured thermodynamic parameters. We also tested the efficacy of the SP in predicting secondary structure by using gapless threading, which we advocate as an alternative method for rapidly predicting RNA structures. For RNA molecules with less than 700 nucleotides, about 70% of the native base-pairs are correctly predicted. As a further validation of the SPs we calculated Z-scores, which measure the relative stability of the native state with respect to a manifold of higher free energy states. The computed Z-scores agree with estimates made using calorimetric measurements for a few RNA molecules. Structural analysis was used to rationalize the success and failures of SP and experimentally determined parameters. First, from the near perfect linear relationship between the number of native base-pairs and sequence length, we show that nearly 46% of nucleotides are not in stacks. Second, by analyzing the suboptimal structures that are generated in gapless threading we show that the SPs and experimentally determined parameters are most successful in predicting stacks that end in hairpins. These results show that further improvement in secondary structure prediction requires reliable estimates of interaction parameters for loops, bulges, and stacks that do not end in hairpins.

[1]  I. Tinoco,et al.  How RNA folds. , 1999, Journal of molecular biology.

[2]  M. Moorhouse,et al.  The Protein Databank , 2005 .

[3]  K. Dill,et al.  Statistical potentials extracted from protein structures: how accurate are they? , 1996, Journal of molecular biology.

[4]  Eric Westhof,et al.  Assembly of core helices and rapid tertiary folding of a small bacterial group I ribozyme , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Jennifer A. Doudna,et al.  The chemical repertoire of natural ribozymes , 2002, Nature.

[6]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[7]  William K. Ridgeway,et al.  X-ray crystal structures of the WT and a hyper-accurate ribosome from Escherichia coli , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8]  J R Banavar,et al.  Scoring functions in protein folding and design , 2000, Protein science : a publication of the Protein Society.

[9]  I. Tinoco,et al.  RNA folding causes secondary structure rearrangement. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[10]  R. Giegé,et al.  tRNA mimics. , 1998, Current opinion in structural biology.

[11]  R. Jernigan,et al.  Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. , 1996, Journal of molecular biology.

[12]  E Rivas,et al.  A dynamic programming algorithm for RNA structure prediction including pseudoknots. , 1998, Journal of molecular biology.

[13]  J. McCaskill The equilibrium partition function and base pair binding probabilities for RNA secondary structure , 1990, Biopolymers.

[14]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[15]  H. Scheraga,et al.  Medium- and long-range interaction parameters between amino acids for predicting three-dimensional structures of proteins. , 1976, Macromolecules.

[16]  A. Godzik,et al.  A general method for the prediction of the three dimensional structure and folding pathway of globular proteins: Application to designed helical proteins , 1993 .

[17]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[18]  Amos Maritan,et al.  Extraction of interaction potentials between amino acids from native protein structures , 2000 .

[19]  D Thirumalai,et al.  Native secondary structure formation in RNA may be a slave to tertiary folding. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[20]  J. Sabina,et al.  Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. , 1999, Journal of molecular biology.

[21]  Ivo L. Hofacker,et al.  Vienna RNA secondary structure server , 2003, Nucleic Acids Res..

[22]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[23]  M. Sippl Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. , 1990, Journal of molecular biology.

[24]  K. Dill,et al.  An iterative method for extracting energy-like quantities from protein structures. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[25]  A. Godzik,et al.  Are proteins ideal mixtures of amino acids? Analysis of energy parameter sets , 1995, Protein science : a publication of the Protein Society.

[26]  C. Kundrot,et al.  Crystal Structure of a Group I Ribozyme Domain: Principles of RNA Packing , 1996, Science.

[27]  H. Noller,et al.  Aminoacyl esterase activity of the Tetrahymena ribozyme. , 1992, Science.

[28]  Lance G. Laing,et al.  Thermodynamics of RNA folding in a conserved ribosomal RNA domain. , 1994, Journal of molecular biology.

[29]  Michael Zuker,et al.  Algorithms and Thermodynamics for RNA Secondary Structure Prediction: A Practical Guide , 1999 .

[30]  A. E. Walter,et al.  Coaxial stacking of helixes enhances binding of oligoribonucleotides and improves predictions of RNA folding. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[31]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[32]  A. Zee,et al.  RNA folding and large N matrix theory , 2001, cond-mat/0106359.

[33]  A. Godzik,et al.  Derivation and testing of pair potentials for protein folding. When is the quasichemical approximation correct? , 1997, Protein science : a publication of the Protein Society.

[34]  R. Jernigan,et al.  Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .

[35]  D. Thirumalai,et al.  D Thirumalai schemes sensitivity of predicted native states to variations in the interaction Pair potentials for protein folding : choice of reference states and , 1999 .

[36]  R. Jernigan,et al.  An empirical energy potential with a reference state for protein fold and sequence recognition , 1999, Proteins.

[37]  P. Privalov,et al.  Thermodynamic analysis of transfer RNA unfolding. , 1978, Journal of molecular biology.