RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble.

Prediction of RNA secondary structure by free energy minimization has been the standard for over two decades. Here we describe a novel method that forsakes this paradigm for predictions based on Boltzmann-weighted structure ensemble. We introduce the notion of a centroid structure as a representative for a set of structures and describe a procedure for its identification. In comparison with the minimum free energy (MFE) structure using diverse types of structural RNAs, the centroid of the ensemble makes 30.0% fewer prediction errors as measured by the positive predictive value (PPV) with marginally improved sensitivity. The Boltzmann ensemble can be separated into a small number (3.2 on average) of clusters. Among the centroids of these clusters, the "best cluster centroid" as determined by comparison to the known structure simultaneously improves PPV by 46.5% and sensitivity by 21.7%. For 58% of the studied sequences for which the MFE structure is outside the cluster containing the best centroid, the improvements by the best centroid are 62.5% for PPV and 31.4% for sensitivity. These results suggest that the energy well containing the MFE structure under the current incomplete energy model is often different from the one for the unavailable complete model that presumably contains the unique native structure. Centroids are available on the Sfold server at http://sfold.wadsworth.org.

[1]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[2]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[3]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[4]  M. Zuker On finding all suboptimal foldings of an RNA molecule. , 1989, Science.

[5]  J. McCaskill The equilibrium partition function and base pair binding probabilities for RNA secondary structure , 1990, Biopolymers.

[6]  N. Larsen,et al.  SRP-RNA sequence alignment and secondary structure. , 1991, Nucleic acids research.

[7]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[8]  R. Abagyan Towards protein folding by global energy optimization , 1993, FEBS letters.

[9]  Sergey Steinberg,et al.  Compilation of tRNA sequences and sequences of tRNA genes , 2004, Nucleic Acids Res..

[10]  André Hardy,et al.  An examination of procedures for determining the number of clusters in a data set , 1994 .

[11]  James W. Brown The ribonuclease P database , 1998, Nucleic Acids Res..

[12]  Gary D. Stormo,et al.  An RNA folding method capable of identifying pseudoknots and base triples , 1998, Bioinform..

[13]  D. Turner,et al.  Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson-Crick base pairs. , 1998, Biochemistry.

[14]  J. Sabina,et al.  Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. , 1999, Journal of molecular biology.

[15]  Garland R. Marshall,et al.  A potential smoothing algorithm accurately predicts transmembrane helix packing , 1999, Nature Structural Biology.

[16]  P. Schuster,et al.  Complete suboptimal folding of RNA and the stability of secondary structures. , 1999, Biopolymers.

[17]  Christian Zwieb,et al.  SRPDB (Signal Recognition Particle Database) , 2000, Nucleic Acids Res..

[18]  Mike A. Steel,et al.  Metrics on RNA Secondary Structures , 2000, J. Comput. Biol..

[19]  Nan Yu,et al.  The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs , 2002, BMC Bioinformatics.

[20]  Christian Zwieb,et al.  tmRDB (tmRNA database) , 2001, Nucleic Acids Res..

[21]  C. Lawrence,et al.  Statistical prediction of single-stranded regions in RNA secondary structure and application to predicting effective antisense target sites and beyond. , 2001, Nucleic acids research.

[22]  Michael Zuker,et al.  Mfold web server for nucleic acid folding and hybridization prediction , 2003, Nucleic Acids Res..

[23]  C. Lawrence,et al.  A statistical sampling algorithm for RNA secondary structure prediction. , 2003, Nucleic acids research.

[24]  Susmita Datta,et al.  Comparisons and validation of statistical clustering techniques for microarray gene expression data , 2003, Bioinform..

[25]  Christian Zwieb,et al.  SRPDB: Signal Recognition Particle Database , 2003, Nucleic Acids Res..

[26]  Ivo L. Hofacker,et al.  Vienna RNA secondary structure server , 2003, Nucleic Acids Res..

[27]  Robert Giegerich,et al.  Abstract shapes of RNA. , 2004, Nucleic acids research.

[28]  Jeffrey E. Barrick,et al.  New RNA motifs suggest an expanded scope for riboswitches in bacterial genetic control. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[29]  P. Schuster,et al.  RNA multi-structure landscapes , 1993, European Biophysics Journal.

[30]  Weixiong Zhang,et al.  An Iterated loop matching approach to the prediction of RNA secondary structures with pseudoknots , 2004, Bioinform..

[31]  D. Mathews Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. , 2004, RNA.

[32]  Sean R. Eddy,et al.  Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction , 2004, BMC Bioinformatics.

[33]  D. Turner,et al.  Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Robert Giegerich,et al.  Evaluating the predictability of conformational switching in RNA , 2004, Bioinform..