GTfold: Enabling parallel RNA secondary structure prediction on multi-core desktops

BackgroundAccurate and efficient RNA secondary structure prediction remains an important open problem in computational molecular biology. Historically, advances in computing technology have enabled faster and more accurate RNA secondary structure predictions. Previous parallelized prediction programs achieved significant improvements in runtime, but their implementations were not portable from niche high-performance computers or easily accessible to most RNA researchers. With the increasing prevalence of multi-core desktop machines, a new parallel prediction program is needed to take full advantage of today’s computing technology.FindingsWe present here the first implementation of RNA secondary structure prediction by thermodynamic optimization for modern multi-core computers. We show that GTfold predicts secondary structure in less time than UNAfold and RNAfold, without sacrificing accuracy, on machines with four or more cores.ConclusionsGTfold supports advances in RNA structural biology by reducing the timescales for secondary structure prediction. The difference will be particularly valuable to researchers working with lengthy RNA sequences, such as RNA viral genomes.

[1]  Bruce A. Shapiro,et al.  Optimization of an RNA Folding Algorithm for Parallel Architectures , 1998, Parallel Comput..

[2]  Robert Giegerich,et al.  A comprehensive comparison of comparative RNA structure prediction approaches , 2004, BMC Bioinformatics.

[3]  J. Sabina,et al.  Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. , 1999, Journal of molecular biology.

[4]  Michael Zuker,et al.  UNAFold: software for nucleic acid folding and hybridization. , 2008, Methods in molecular biology.

[5]  David A. Bader,et al.  GTfold: a scalable multicore code for RNA secondary structure prediction , 2009, SAC '09.

[6]  J. McCaskill The equilibrium partition function and base pair binding probabilities for RNA secondary structure , 1990, Biopolymers.

[7]  David H. Mathews,et al.  NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure , 2009, Nucleic Acids Res..

[8]  Peter F. Stadler,et al.  Knowledge Discovery in RNA Sequence Families of HIV Using Scalable Computers , 1996, KDD.

[9]  D. Mathews,et al.  Accurate SHAPE-directed RNA structure determination , 2009, Proceedings of the National Academy of Sciences.

[10]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[11]  Nan Yu,et al.  The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs , 2002, BMC Bioinformatics.

[12]  D. Turner,et al.  Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Daniel A. Keim,et al.  On Knowledge Discovery and Data Mining , 1997 .

[14]  Peter F. Stadler,et al.  Prediction of RNA Base Pairing Probabilities on Massively Parallel Computers , 2000, J. Comput. Biol..

[15]  P. Schuster,et al.  Complete suboptimal folding of RNA and the stability of secondary structures. , 1999, Biopolymers.

[16]  David A. Bader,et al.  Algorithm Engineering for Parallel Computation , 2000, Experimental Algorithmics.

[17]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[18]  Christian N. S. Pedersen,et al.  Internal loops in RNA secondary structure prediction , 1999, RECOMB.

[19]  R. Gutell,et al.  The accuracy of ribosomal RNA comparative structure models. , 2002, Current opinion in structural biology.

[20]  Peter F. Stadler,et al.  Memory Efficient Folding Algorithms for Circular RNA Secondary Structures , 2006, German Conference on Bioinformatics.