Multithreaded comparative RNA secondary structure prediction using stochastic context-free grammars

BackgroundThe prediction of the structure of large RNAs remains a particular challenge in bioinformatics, due to the computational complexity and low levels of accuracy of state-of-the-art algorithms. The pfold model couples a stochastic context-free grammar to phylogenetic analysis for a high accuracy in predictions, but the time complexity of the algorithm and underflow errors have prevented its use for long alignments. Here we present PPfold, a multithreaded version of pfold, which is capable of predicting the structure of large RNA alignments accurately on practical timescales.ResultsWe have distributed both the phylogenetic calculations and the inside-outside algorithm in PPfold, resulting in a significant reduction of runtime on multicore machines. We have addressed the floating-point underflow problems of pfold by implementing an extended-exponent datatype, enabling PPfold to be used for large-scale RNA structure predictions. We have also improved the user interface and portability: alongside standalone executable and Java source code of the program, PPfold is also available as a free plugin to the CLC Workbenches. We have evaluated the accuracy of PPfold using BRaliBase I tests, and demonstrated its practical use by predicting the secondary structure of an alignment of 24 complete HIV-1 genomes in 65 minutes on an 8-core machine and identifying several known structural elements in the prediction.ConclusionsPPfold is the first parallelized comparative RNA structure prediction algorithm to date. Based on the pfold model, PPfold is capable of fast, high-quality predictions of large RNA secondary structures, such as the genomes of RNA viruses or long genomic transcripts. The techniques used in the parallelization of this algorithm may be of general applicability to other bioinformatics algorithms.

[1]  Jakob E. Bardram,et al.  The Mini-Grid Framework: Application Programming Support for Ad-Hoc, Peer-to-Peer Volunteer Grids , 2010, GPC.

[2]  Vladimir Solmon,et al.  The estimation of stochastic context-free grammars using the Inside-Outside algorithm , 2003 .

[3]  Robert Giegerich,et al.  A comprehensive comparison of comparative RNA structure prediction approaches , 2004, BMC Bioinformatics.

[4]  Jin Chu Wu,et al.  The massively parallel genetic algorithm for RNA folding: MIMD implementation and population variation , 2001, Bioinform..

[5]  E. Andersen Prediction and design of DNA and RNA structures. , 2010, New biotechnology.

[6]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[7]  Bjarne Knudsen,et al.  RNA secondary structure prediction using stochastic context-free grammars and evolutionary history , 1999, Bioinform..

[8]  David A. Bader,et al.  GTfold: a scalable multicore code for RNA secondary structure prediction , 2009, SAC '09.

[9]  Kristen K. Dang,et al.  Architecture and Secondary Structure of an Entire HIV-1 RNA Genome , 2009, Nature.

[10]  Tong Liu,et al.  Parallel RNA secondary structure prediction using stochastic context‐free grammars , 2005, Concurr. Comput. Pract. Exp..

[11]  Ming Ouyang,et al.  Accelerating the Nussinov RNA folding algorithm with CUDA/GPU , 2010, The 10th IEEE International Symposium on Signal Processing and Information Technology.

[12]  M. Huynen,et al.  RNA Folding on Parallel Computers: The Minimum Free Energy Structures of Complete HIV Genomes , 1995 .

[13]  Kay C. Wiese,et al.  jViz.Rna - An Interactive Graphical Tool for Visualizing RNA Secondary Structure Including Pseudoknots , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[14]  Yong Dou,et al.  Fine-grained parallel RNA secondary structure prediction using SCFGs on FPGA , 2010, Parallel Comput..

[15]  Peter F. Stadler,et al.  Prediction of RNA Base Pairing Probabilities on Massively Parallel Computers , 2000, J. Comput. Biol..

[16]  Paola Lecca,et al.  A new probabilistic generative model of parameter inference in biochemical networks , 2009, SAC '09.

[17]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[18]  Bjarne Knudsen,et al.  Pfold: RNA Secondary Structure Prediction Using Stochastic Context-Free Grammars , 2003 .

[19]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .