A fast structural multiple alignment method for long RNA sequences

BackgroundAligning multiple RNA sequences is essential for analyzing non-coding RNAs. Although many alignment methods for non-coding RNAs, including Sankoff's algorithm for strict structural alignments, have been proposed, they are either inaccurate or computationally too expensive. Faster methods with reasonable accuracies are required for genome-scale analyses.ResultsWe propose a fast algorithm for multiple structural alignments of RNA sequences that is an extension of our pairwise structural alignment method (implemented in SCARNA). The accuracies of the implemented software, MXSCARNA, are at least as favorable as those of state-of-art algorithms that are computationally much more expensive in time and memory.ConclusionThe proposed method for structural alignment of multiple RNA sequences is fast enough for large-scale analyses with accuracies at least comparable to those of existing algorithms. The source code of MXSCARNA and its web server are available at http://mxscarna.ncrna.org.

[1]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[2]  Knut Reinert,et al.  Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization , 2007, BMC Bioinformatics.

[3]  Chuong B. Do,et al.  ProbCons: Probabilistic consistency-based multiple sequence alignment. , 2005, Genome research.

[4]  Kiyoshi Asai,et al.  Rfold: an exact algorithm for computing local base pairing probabilities , 2008, Bioinform..

[5]  Sean R. Eddy,et al.  Rfam: an RNA family database , 2003, Nucleic Acids Res..

[6]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[7]  Taku Kudo,et al.  Mining frequent stem patterns from unaligned RNA sequences , 2006, Bioinform..

[8]  Yasuo Tabei,et al.  SCARNA: fast and accurate structural alignment of RNA sequences by matching fixed-length stem fragments , 2006, Bioinform..

[9]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[10]  Burkhard Morgenstern,et al.  DIALIGN: finding local similarities by multiple sequence alignment , 1998, Bioinform..

[11]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[12]  Yasuo Tabei,et al.  Murlet: a practical multiple alignment tool for structural RNA sequences , 2007, Bioinform..

[13]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[14]  Rolf Backofen,et al.  Backofen R: MARNA: multiple alignment and consensus structure prediction of RNAs based on sequence structure comparisons , 2005 .

[15]  Christian Zwieb,et al.  SRPDB (Signal Recognition Particle Database) , 2000, Nucleic Acids Res..

[16]  J. McCaskill The equilibrium partition function and base pair binding probabilities for RNA secondary structure , 1990, Biopolymers.

[17]  Rolf Backofen,et al.  Inferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering , 2007, PLoS Comput. Biol..

[18]  Sean R. Eddy,et al.  RSEARCH: Finding homologs of single structured RNA sequences , 2003, BMC Bioinformatics.

[19]  Ian Holmes,et al.  Pairwise RNA Structure Comparison with Stochastic Context-Free Grammars , 2001, Pacific Symposium on Biocomputing.

[20]  Kiyoshi Asai,et al.  Robust prediction of consensus secondary structures using averaged base pairing probability matrices , 2007, Bioinform..

[21]  Peter F. Stadler,et al.  Alignment of RNA base pairing probability matrices , 2004, Bioinform..

[22]  P. Stadler,et al.  Secondary structure prediction for aligned RNA sequences. , 2002, Journal of molecular biology.

[23]  Zasha Weinberg,et al.  CMfinder - a covariance model based RNA motif finding algorithm , 2006, Bioinform..

[24]  Jerrold R. Griggs,et al.  Algorithms for Loop Matchings , 1978 .

[25]  Tobias Müller,et al.  4SALE – A tool for synchronous RNA sequence and secondary structure alignment and editing , 2006, BMC Bioinformatics.

[26]  J. Sabina,et al.  Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. , 1999, Journal of molecular biology.

[27]  Bjarne Knudsen,et al.  Pfold: RNA Secondary Structure Prediction Using Stochastic Context-Free Grammars , 2003 .

[28]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[29]  Xing Xu,et al.  RNA Sampler: a new sampling based algorithm for common RNA secondary structure prediction and structural alignment , 2007, Bioinform..

[30]  Olivier Poch,et al.  A comprehensive comparison of multiple sequence alignment programs , 1999, Nucleic Acids Res..

[31]  D. Sankoff Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems , 1985 .

[32]  Ian Holmes,et al.  Dynamic programming alignment accuracy , 1998, RECOMB '98.

[33]  Gary D. Stormo,et al.  Pairwise local structural alignment of RNA sequences with sequence similarity less than 40% , 2005, Bioinform..

[34]  Christian Zwieb,et al.  SRPDB: Signal Recognition Particle Database , 2003, Nucleic Acids Res..

[35]  A. Wilm,et al.  A benchmark of multiple sequence alignment programs upon structural RNAs , 2005, Nucleic acids research.

[36]  Jan Gorodkin,et al.  Multiple structural alignment and clustering of RNA sequences , 2007, Bioinform..

[37]  Robert Giegerich,et al.  A comprehensive comparison of comparative RNA structure prediction approaches , 2004, BMC Bioinformatics.

[38]  S. Miyazawa A reliable sequence alignment method based on probabilities of residue correspondences. , 1995, Protein engineering.

[39]  Deniz Dalli,et al.  StrAl: progressive alignment of non-coding RNA using base pairing probability vectors in quadratic time , 2006, Bioinform..

[40]  Durbin,et al.  Biological Sequence Analysis , 1998 .

[41]  Gaurav Sharma,et al.  Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign , 2007, BMC Bioinformatics.

[42]  Ian Holmes,et al.  A probabilistic model for the evolution of RNA structure , 2004, BMC Bioinformatics.