RNAalifold: improved consensus structure prediction for RNA alignments

BackgroundThe prediction of a consensus structure for a set of related RNAs is an important first step for subsequent analyses. RNAalifold, which computes the minimum energy structure that is simultaneously formed by a set of aligned sequences, is one of the oldest and most widely used tools for this task. In recent years, several alternative approaches have been advocated, pointing to several shortcomings of the original RNAalifold approach.ResultsWe show that the accuracy of RNAalifold predictions can be improved substantially by introducing a different, more rational handling of alignment gaps, and by replacing the rather simplistic model of covariance scoring with more sophisticated RIBOSUM-like scoring matrices. These improvements are achieved without compromising the computational efficiency of the algorithm. We show here that the new version of RNAalifold not only outperforms the old one, but also several other tools recently developed, on different datasets.ConclusionThe new version of RNAalifold not only can replace the old one for almost any application but it is also competitive with other approaches including those based on SCFGs, maximum expected accuracy, or hierarchical nearest neighbor classifiers.

[1]  M Vingron,et al.  Weighting in sequence space: a comparison of methods in terms of generalized sequences. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[3]  Ian Holmes,et al.  Stem Stem Stem Stem Loop Loop Loop LoopLoop Loop Loop Loop Loop Loop Loop , 2005 .

[4]  D. Sankoff Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems , 1985 .

[5]  Robert Giegerich,et al.  Local similarity in RNA secondary structures , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[6]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[7]  Gaurav Sharma,et al.  Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign , 2007, BMC Bioinformatics.

[8]  C. Lawrence,et al.  Centroid estimation in discrete high-dimensional spaces with applications in biology , 2008, Proceedings of the National Academy of Sciences.

[9]  Ye Ding,et al.  A Bayesian Statistical Algorithm for RNA Secondary Structure Prediction , 1999, Comput. Chem..

[10]  Sonja J. Prohaska,et al.  RNAs everywhere: genome-wide annotation of structured RNAs. , 2007, Journal of experimental zoology. Part B, Molecular and developmental evolution.

[11]  Peter F Stadler,et al.  Fast and reliable prediction of noncoding RNAs , 2005, Proc. Natl. Acad. Sci. USA.

[12]  Guy Perrière,et al.  The European ribosomal RNA database , 2004, Nucleic Acids Res..

[13]  Andreas Wilm,et al.  ConStruct: Improved construction of RNA consensus structures , 2008, BMC Bioinformatics.

[14]  Rolf Backofen,et al.  Inferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering , 2007, PLoS Comput. Biol..

[15]  Deniz Dalli,et al.  StrAl: progressive alignment of non-coding RNA using base pairing probability vectors in quadratic time , 2006, Bioinform..

[16]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[17]  S. Salzberg,et al.  The Transcriptional Landscape of the Mammalian Genome , 2005, Science.

[18]  P. Smith Santa Fe, New Mexico , 1969 .

[19]  Sean R. Eddy,et al.  Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints , 2006, BMC Bioinformatics.

[20]  P. Stadler,et al.  Secondary structure prediction for aligned RNA sequences. , 2002, Journal of molecular biology.

[21]  Peter F. Stadler,et al.  Memory Efficient Folding Algorithms for Circular RNA Secondary Structures , 2006, German Conference on Bioinformatics.

[22]  Rolf Backofen,et al.  Backofen R: MARNA: multiple alignment and consensus structure prediction of RNAs based on sequence structure comparisons , 2005 .

[23]  Bjarne Knudsen,et al.  Pfold: RNA Secondary Structure Prediction Using Stochastic Context-Free Grammars , 2003 .

[24]  B. Shapiro,et al.  RNA secondary structure prediction from sequence alignments using a network of k-nearest neighbor classifiers. , 2006, RNA.

[25]  Sean R. Eddy,et al.  RSEARCH: Finding homologs of single structured RNA sequences , 2003, BMC Bioinformatics.

[26]  Jan Gorodkin,et al.  Fast Pairwise Structural RNA Alignments by Pruning of the Dynamical Programming Matrix , 2007, PLoS Comput. Biol..

[27]  Peter Sestoft,et al.  Semiautomated improvement of RNA alignments. , 2007, RNA.

[28]  P. Schuster,et al.  Algorithm independent properties of RNA secondary structure predictions , 1996, European Biophysics Journal.

[29]  Peter F. Stadler,et al.  Automatic Detection of Conserved Base Pairing Patterns in RNA Virus Genomes , 1998, Comput. Chem..

[30]  Tirza Doniger,et al.  RNAspa: a shortest path approach for comparative prediction of the secondary structure of ncRNA molecules , 2007, BMC Bioinformatics.

[31]  J. Gorodkin,et al.  Unifying evolutionary and thermodynamic information for RNA folding of multiple alignments , 2008, Nucleic acids research.

[32]  Tanja Gesell,et al.  Dinucleotide controlled null models for comparative RNA gene prediction , 2008, BMC Bioinformatics.

[33]  Kevin P. Murphy,et al.  Efficient parameter estimation for RNA secondary structure prediction , 2007, ISMB/ECCB.

[34]  David H Mathews,et al.  Prediction of RNA secondary structure by free energy minimization. , 2006, Current opinion in structural biology.

[35]  Rodrigo Lopez,et al.  Multiple sequence alignment with the Clustal series of programs , 2003, Nucleic Acids Res..

[36]  Robert Giegerich,et al.  Consensus shapes: an alternative to the Sankoff algorithm for RNA consensus structure prediction , 2005, Bioinform..

[37]  Kiyoshi Asai,et al.  Robust prediction of consensus secondary structures using averaged base pairing probability matrices , 2007, Bioinform..

[38]  Ivo L Hofacker,et al.  RNAs everywhere: genome-wide annotation of structured RNAs. , 2006, Genome informatics. International Conference on Genome Informatics.

[39]  Anne Condon,et al.  RNA STRAND: The RNA Secondary Structure and Statistical Analysis Database , 2008, BMC Bioinformatics.

[40]  Robert Giegerich,et al.  A comprehensive comparison of comparative RNA structure prediction approaches , 2004, BMC Bioinformatics.

[41]  Stephan H. Bernhart,et al.  Strategies for measuring evolutionary conservation of RNA secondary structures , 2008, BMC Bioinformatics.

[42]  I. Hofacker,et al.  Consensus folding of aligned sequences as a new measure for the detection of functional RNAs by comparative genomics. , 2004, Journal of molecular biology.

[43]  Sean R. Eddy,et al.  Rfam: annotating non-coding RNAs in complete genomes , 2004, Nucleic Acids Res..

[44]  Zasha Weinberg,et al.  CMfinder - a covariance model based RNA motif finding algorithm , 2006, Bioinform..

[45]  D. Reidel,et al.  The Transcriptional Landscape of the Mammalian Genome The FANTOM Consortium* and RIKEN Genome Exploration Research Group and Genome Science Group (Genome Network Project Core Group)* , 2005 .