IPknot: fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming

Motivation: Pseudoknots found in secondary structures of a number of functional RNAs play various roles in biological processes. Recent methods for predicting RNA secondary structures cover certain classes of pseudoknotted structures, but only a few of them achieve satisfying predictions in terms of both speed and accuracy. Results: We propose IPknot, a novel computational method for predicting RNA secondary structures with pseudoknots based on maximizing expected accuracy of a predicted structure. IPknot decomposes a pseudoknotted structure into a set of pseudoknot-free substructures and approximates a base-pairing probability distribution that considers pseudoknots, leading to the capability of modeling a wide class of pseudoknots and running quite fast. In addition, we propose a heuristic algorithm for refining base-paring probabilities to improve the prediction accuracy of IPknot. The problem of maximizing expected accuracy is solved by using integer programming with threshold cut. We also extend IPknot so that it can predict the consensus secondary structure with pseudoknots when a multiple sequence alignment is given. IPknot is validated through extensive experiments on various datasets, showing that IPknot achieves better prediction accuracy and faster running time as compared with several competitive prediction methods. Availability: The program of IPknot is available at http://www.ncrna.org/software/ipknot/. IPknot is also available as a web server at http://rna.naist.jp/ipknot/. Contact: satoken@k.u-tokyo.ac.jp; ykato@is.naist.jp Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  David H Mathews,et al.  RNA pseudoknots: folding and finding , 2010, F1000 biology reports.

[2]  Tatsuya Akutsu Recent Advances in RNA Secondary Structure Prediction with Pseudoknots , 2006 .

[3]  A. Condon,et al.  Improved free energy parameters for RNA pseudoknotted secondary structure prediction. , 2010, RNA.

[4]  C. Lawrence,et al.  Centroid estimation in discrete high-dimensional spaces with applications in biology , 2008, Proceedings of the National Academy of Sciences.

[5]  Kiyoshi Asai,et al.  Prediction of RNA secondary structure using generalized centroid estimators , 2009, Bioinform..

[6]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[7]  E Rivas,et al.  A dynamic programming algorithm for RNA structure prediction including pseudoknots. , 1998, Journal of molecular biology.

[8]  F. H. D. van Batenburg,et al.  PseudoBase: structural information on RNA pseudoknots , 2001, Nucleic Acids Res..

[9]  Weixiong Zhang,et al.  An Iterated loop matching approach to the prediction of RNA secondary structures with pseudoknots , 2004, Bioinform..

[10]  Robert Giegerich,et al.  Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics , 2004, BMC Bioinformatics.

[11]  Tatsuya Akutsu,et al.  RactIP: fast and accurate prediction of RNA-RNA interaction using integer programming , 2010, Bioinform..

[12]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[13]  Bjarne Knudsen,et al.  Pfold: RNA Secondary Structure Prediction Using Stochastic Context-Free Grammars , 2003 .

[14]  Kiyoshi Asai,et al.  Prediction of RNA secondary structure by maximizing pseudo-expected accuracy , 2010, BMC Bioinformatics.

[15]  S. Eddy Non–coding RNA genes and the modern RNA world , 2001, Nature Reviews Genetics.

[16]  Niles A. Pierce,et al.  A partition function algorithm for nucleic acid secondary structure including pseudoknots , 2003, J. Comput. Chem..

[17]  D. J. A. Welsh,et al.  An upper bound for the chromatic number of a graph and its application to timetabling problems , 1967, Comput. J..

[18]  Kevin P. Murphy,et al.  Efficient parameter estimation for RNA secondary structure prediction , 2007, ISMB/ECCB.

[19]  J. McCaskill The equilibrium partition function and base pair binding probabilities for RNA secondary structure , 1990, Biopolymers.

[20]  Serafim Batzoglou,et al.  CONTRAfold: RNA secondary structure prediction without physics-based models , 2006, ISMB.

[21]  Anne Condon,et al.  RNA STRAND: The RNA Secondary Structure and Statistical Analysis Database , 2008, BMC Bioinformatics.

[22]  Einar Andreas Rødland Pseudoknots in RNA Secondary Structures: Representation, Enumeration, and Prevalence , 2006, J. Comput. Biol..

[23]  Kiyoshi Asai,et al.  Improving the accuracy of predicting secondary structure for aligned RNA sequences , 2010, Nucleic Acids Res..

[24]  Tatsuya Akutsu,et al.  Prediction of RNA secondary structure with pseudoknots using integer programming , 2009, BMC Bioinformatics.

[25]  Chuong B. Do,et al.  ProbCons: Probabilistic consistency-based multiple sequence alignment. , 2005, Genome research.

[26]  Christian N. S. Pedersen,et al.  Pseudoknots in RNA secondary structures , 2000, RECOMB '00.

[27]  Christian N. S. Pedersen,et al.  RNA Pseudoknot Prediction in Energy-Based Models , 2000, J. Comput. Biol..

[28]  H. Hoos,et al.  HotKnots: heuristic prediction of RNA secondary structures including pseudoknots. , 2005, RNA.

[29]  Kiyoshi Asai,et al.  CentroidAlign: fast and accurate aligner for structured RNAs by maximizing expected sum-of-pairs score , 2009, Bioinform..

[30]  I. Brierley,et al.  Viral RNA pseudoknots: versatile motifs in gene expression and replication , 2007, Nature Reviews Microbiology.

[31]  Peter F. Stadler,et al.  Prediction of consensus RNA secondary structures including pseudoknots , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[32]  D. W. Staple,et al.  Open access, freely available online Primer Pseudoknots: RNA Structures with Diverse Functions , 2022 .

[33]  C. Florentz,et al.  Novel features in the tRNA-like world of plant viral RNAs , 2001, Cellular and Molecular Life Sciences CMLS.

[34]  Chuan-Sheng Foo,et al.  A max-margin model for efficient simultaneous alignment and folding of RNA sequences , 2008, ISMB.

[35]  Hesham H. Ali,et al.  High sensitivity RNA pseudoknot prediction , 2006, Nucleic acids research.

[36]  Ivo L. Hofacker,et al.  Vienna RNA secondary structure server , 2003, Nucleic Acids Res..

[37]  Zhiyong Wang,et al.  FlexStem: improving predictions of RNA secondary structures with pseudoknots by reducing the search space , 2008, Bioinform..

[38]  Robert D. Finn,et al.  Rfam: Wikipedia, clans and the “decimal” release , 2010, Nucleic Acids Res..

[39]  Pedro J. Tejada,et al.  K-Partite RNA Secondary Structures , 2010, J. Comput. Biol..

[40]  Niles A. Pierce,et al.  An algorithm for computing nucleic acid base‐pairing probabilities including pseudoknots , 2004, J. Comput. Chem..

[41]  Michael Zuker,et al.  Mfold web server for nucleic acid folding and hybridization prediction , 2003, Nucleic Acids Res..

[42]  Tatsuya Akutsu,et al.  Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots , 2000, Discret. Appl. Math..

[43]  K. Murphy,et al.  Computational approaches for RNA energy parameter estimation. , 2010, RNA.

[44]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[45]  Sebastian Will,et al.  RNAalifold: improved consensus structure prediction for RNA alignments , 2008, BMC Bioinformatics.

[46]  Song Cao,et al.  Predicting RNA pseudoknot folding thermodynamics , 2006, Nucleic acids research.

[47]  P. Stadler,et al.  Secondary structure prediction for aligned RNA sequences. , 2002, Journal of molecular biology.

[48]  J. Sabina,et al.  Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. , 1999, Journal of molecular biology.

[49]  D. Mathews,et al.  ProbKnot: fast prediction of RNA secondary structure including pseudoknots. , 2010, RNA.

[50]  Anne Condon,et al.  Classifying RNA pseudoknotted structures , 2004, Theor. Comput. Sci..