A Worst-Case and Practical Speedup for the RNA Co-folding Problem Using the Four-Russians Idea

The computational formulation for finding the optimal simultaneous alignment and fold (optimal Co-fold) of RNA sequences was first introduced by Sankoff in 1985. Since then the importance of Co-Folding has grown as conservation of structure and its relationship to function have been widely observed in RNA. For two sequences, the computation time of Sankoff's Algorithm is θ(N6). Existing literature on cofolding attempts to improve efficiency through simplifying the original problem formulation. We present here a practical and worst-case speed up using the Four-Russians method, without placing any added constraints on the types of alignments or folds allowed. Our algorithm, Fast Cofold, finds the optimal Co-fold in O(N6/ log(N2))-time, a speedup which is observed in practice. Because the solution matrix produced by our algorithm is identical to the one produced by the Sankoff algorithm, the contribution of the algorithm lays not only in its standalone practicality but also in the ability to implement it alongside heuristic speed ups leading to even greater reductions in time.

[1]  Gad M. Landau,et al.  Fast RNA Structure Alignment for Crossing Input Structures , 2009, CPM.

[2]  Elena Rivas,et al.  Noncoding RNA gene detection using comparative sequence analysis , 2001, BMC Bioinformatics.

[3]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[4]  Ron Shamir,et al.  A Faster Algorithm for RNA Co-folding , 2008, WABI.

[5]  Sonja J. Prohaska,et al.  Computational RNomics of Drosophilids , 2007, BMC Genomics.

[6]  David Haussler,et al.  Identification and Classification of Conserved RNA Secondary Structures in the Human Genome , 2006, PLoS Comput. Biol..

[7]  Rita Casadio,et al.  Algorithms in Bioinformatics, 5th International Workshop, WABI 2005, Mallorca, Spain, October 3-6, 2005, Proceedings , 2005, WABI.

[8]  W. L. Ruzzo,et al.  Comparative genomics beyond sequence-based alignments: RNA structures in the ENCODE regions. , 2008, Genome research.

[9]  Laurie J. Heyer,et al.  Finding the most significant common sequence and structure motifs in a set of RNA sequences. , 1997, Nucleic acids research.

[10]  Gary D. Stormo,et al.  Finding Common Sequence and Structure Motifs in a Set of RNA Sequences , 1997, ISMB.

[11]  D. Sankoff Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems , 1985 .

[12]  Dan Gusfield,et al.  A simple, practical and complete O-time Algorithm for RNA folding using the Four-Russians Speedup , 2010, Algorithms for Molecular Biology.

[13]  Jerrold R. Griggs,et al.  Algorithms for Loop Matchings , 1978 .

[14]  D. Turner,et al.  Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. , 2002, Journal of molecular biology.

[15]  Dan Gusfield,et al.  A simple, practical and complete O(n³/log n)-time algorithm for RNA folding using the four-Russians speedup , 2009, WABI 2009.

[16]  Jan Gorodkin,et al.  Multiple structural alignment and clustering of RNA sequences , 2007, Bioinform..

[17]  J. Gorodkin,et al.  Unifying evolutionary and thermodynamic information for RNA folding of multiple alignments , 2008, Nucleic acids research.

[18]  Rolf Backofen,et al.  Sparse RNA folding: Time and space efficient algorithms , 2009, J. Discrete Algorithms.

[19]  I. Hofacker,et al.  Consensus folding of aligned sequences as a new measure for the detection of functional RNAs by comparative genomics. , 2004, Journal of molecular biology.

[20]  Dan Gusfield,et al.  A Simple, Practical and Complete O(\fracn3 logn)O(\frac{n^3}{ \log n})-Time Algorithm for RNA Folding Using the Four-Russians Speedup , 2009, WABI.

[21]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[22]  S. Eddy Computational Genomics of Noncoding RNA Genes , 2002, Cell.

[23]  Sean R. Eddy,et al.  Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints , 2006, BMC Bioinformatics.