FlexStem: improving predictions of RNA secondary structures with pseudoknots by reducing the search space

MOTIVATION RNA secondary structures with pseudoknots are often predicted by minimizing free energy, which is proved to be NP-hard. Due to kinetic reasons the real RNA secondary structure often has local instead of global minimum free energy. This implies that we may improve the performance of RNA secondary structure prediction by taking kinetics into account and minimize free energy in a local area. RESULT we propose a novel algorithm named FlexStem to predict RNA secondary structures with pseudoknots. Still based on MFE criterion, FlexStem adopts comprehensive energy models that allow complex pseudoknots. Unlike classical thermodynamic methods, our approach aims to simulate the RNA folding process by successive addition of maximal stems, reducing the search space while maintaining or even improving the prediction accuracy. This reduced space is constructed by our maximal stem strategy and stem-adding rule induced from elaborate statistical experiments on real RNA secondary structures. The strategy and the rule also reflect the folding characteristic of RNA from a new angle and help compensate for the deficiency of merely relying on MFE in RNA structure prediction. We validate FlexStem by applying it to tRNAs, 5SrRNAs and a large number of pseudoknotted structures and compare it with the well-known algorithms such as RNAfold, PKNOTS, PknotsRG, HotKnots and ILM according to their overall sensitivities and specificities, as well as positive and negative controls on pseudoknots. The results show that FlexStem significantly increases the prediction accuracy through its local search strategy. AVAILABILITY Software is available at http://pfind.ict.ac.cn/FlexStem/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Sergey Steinberg,et al.  Compilation of tRNA sequences and sequences of tRNA genes , 2004, Nucleic Acids Res..

[2]  Stephen Neidle,et al.  Principles of nucleic acid structure , 2007 .

[3]  P. Higgs RNA secondary structure: physical and computational aspects , 2000, Quarterly Reviews of Biophysics.

[4]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[5]  Bjarne Knudsen,et al.  RNA secondary structure prediction using stochastic context-free grammars and evolutionary history , 1999, Bioinform..

[6]  Robert Giegerich,et al.  Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics , 2004, BMC Bioinformatics.

[7]  Maciej Szymanski,et al.  5S Ribosomal RNA Database , 2002, Nucleic Acids Res..

[8]  Tatsuya Akutsu,et al.  Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots , 2000, Discret. Appl. Math..

[9]  E Rivas,et al.  A dynamic programming algorithm for RNA structure prediction including pseudoknots. , 1998, Journal of molecular biology.

[10]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[11]  D. Turner,et al.  Predicting thermodynamic properties of RNA. , 1995, Methods in enzymology.

[12]  E. Siggia,et al.  Modeling RNA folding paths with pseudoknots: application to hepatitis delta virus ribozyme. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Weixiong Zhang,et al.  An Iterated loop matching approach to the prediction of RNA secondary structures with pseudoknots , 2004, Bioinform..

[14]  H. Hoos,et al.  HotKnots: heuristic prediction of RNA secondary structures including pseudoknots. , 2005, RNA.

[15]  James W. Brown The ribonuclease P database , 1998, Nucleic Acids Res..

[16]  D. Turner,et al.  Improved free-energy parameters for predictions of RNA duplex stability. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Éva Tardos,et al.  Algorithm design , 2005 .

[18]  A. E. Walter,et al.  Coaxial stacking of helixes enhances binding of oligoribonucleotides and improves predictions of RNA folding. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Christian N. S. Pedersen,et al.  RNA Pseudoknot Prediction in Energy-Based Models , 2000, J. Comput. Biol..

[20]  Niles A. Pierce,et al.  A partition function algorithm for nucleic acid secondary structure including pseudoknots , 2003, J. Comput. Chem..

[21]  C. Pleij,et al.  The computer simulation of RNA folding pathways using a genetic algorithm. , 1995, Journal of molecular biology.

[22]  Hesham H. Ali,et al.  High sensitivity RNA pseudoknot prediction , 2006, Nucleic acids research.

[23]  Anne Condon,et al.  Classifying RNA pseudoknotted structures , 2004, Theor. Comput. Sci..

[24]  Ivo L. Hofacker,et al.  Vienna RNA secondary structure server , 2003, Nucleic Acids Res..

[25]  J. Abrahams,et al.  Prediction of RNA secondary structure, including pseudoknotting, by computer simulation. , 1990, Nucleic acids research.

[26]  J. Ng,et al.  PseudoBase: a database with RNA pseudoknots , 2000, Nucleic Acids Res..

[27]  J. Sabina,et al.  Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. , 1999, Journal of molecular biology.