Backofen R: MARNA: multiple alignment and consensus structure prediction of RNAs based on sequence structure comparisons

MOTIVATION Due to the importance of considering secondary structures in aligning functional RNAs, several pairwise sequence-structure alignment methods have been developed. They use extended alignment scores that evaluate secondary structure information in addition to sequence information. However, two problems for the multiple alignment step remain. First, how to combine pairwise sequence-structure alignments into a multiple alignment and second, how to generate secondary structure information for sequences whose explicit structural information is missing. RESULTS We describe a novel approach for multiple alignment of RNAs (MARNA) taking into consideration both the primary and the secondary structures. It is based on pairwise sequence-structure comparisons of RNAs. From these sequence-structure alignments, libraries of weighted alignment edges are generated. The weights reflect the sequential and structural conservation. For sequences whose secondary structures are missing, the libraries are generated by sampling low energy conformations. The libraries are then processed by the T-Coffee system, which is a consistency based multiple alignment method. Furthermore, we are able to extract a consensus-sequence and -structure from a multiple alignment. We have successfully tested MARNA on several datasets taken from the Rfam database.

[1]  P. Stadler,et al.  Secondary structure prediction for aligned RNA sequences. , 2002, Journal of molecular biology.

[2]  Jerrold R. Griggs,et al.  Algorithms for Loop Matchings , 1978 .

[3]  Burkhard Morgenstern,et al.  DIALIGN: finding local similarities by multiple sequence alignment , 1998, Bioinform..

[4]  Kaizhong Zhang,et al.  Comparing multiple RNA secondary structures using tree comparisons , 1990, Comput. Appl. Biosci..

[5]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[6]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[7]  J. Couzin Small RNAs Make Big Splash , 2002, Science.

[8]  Ivo L. Hofacker,et al.  Vienna RNA secondary structure server , 2003, Nucleic Acids Res..

[9]  Florence Corpet,et al.  RNAlign program: alignment of RNA sequences using both primary and secondary structures , 1994, Comput. Appl. Biosci..

[10]  Rolf Backofen,et al.  MARNA: A server for multiple alignment of RNAs , 2003, German Conference on Bioinformatics.

[11]  Bin Ma,et al.  A General Edit Distance between RNA Structures , 2002, J. Comput. Biol..

[12]  Robert Giegerich,et al.  Local similarity in RNA secondary structures , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[13]  Rolf Backofen,et al.  Local Sequence-structure Motifs in Rna , 2004, J. Bioinform. Comput. Biol..

[14]  Sean R. Eddy,et al.  Rfam: an RNA family database , 2003, Nucleic Acids Res..

[15]  Peter F. Stadler,et al.  Alignment of RNA base pairing probability matrices , 2004, Bioinform..

[16]  Jennifer Couzin,et al.  Small RNAs Make Big Splash , 2002, Science.

[17]  Bjarne Knudsen,et al.  Pfold: RNA Secondary Structure Prediction Using Stochastic Context-Free Grammars , 2003 .

[18]  Martin Vingron,et al.  A polyhedral approach to RNA sequence structure alignment , 1998, RECOMB '98.

[19]  D. Sankoff Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems , 1985 .

[20]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[21]  David K. Y. Chiu,et al.  Inferring consensus structure from nucleic acid sequences , 1991, Comput. Appl. Biosci..

[22]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[23]  Michael S. Brown,et al.  Rna modeling using stochastic context-free grammars , 1999 .

[24]  R. Lück,et al.  ConStruct: a tool for thermodynamic controlled prediction of conserved secondary structure. , 1999, Nucleic acids research.

[25]  Temple F. Smith,et al.  Comparison of biosequences , 1981 .

[26]  Sean R. Eddy,et al.  Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction , 2004, BMC Bioinformatics.

[27]  M. Zuker Prediction of RNA secondary structure by energy minimization. , 1994, Methods in molecular biology.

[28]  Bin Ma,et al.  Edit distance between two RNA structures , 2001, RECOMB.

[29]  G. Stormo,et al.  Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods. , 1992, Nucleic acids research.

[30]  C R Woese,et al.  Higher order structural elements in ribosomal RNAs: pseudo-knots and the use of noncanonical pairs. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Lusheng Wang,et al.  Alignment of trees: an alternative to tree edit , 1995 .

[32]  Daniel Gautheret,et al.  A survey of metazoan selenocysteine insertion sequences. , 2002, Biochimie.

[33]  Robert Giegerich,et al.  Abstract shapes of RNA. , 2004, Nucleic acids research.

[34]  Olivier Poch,et al.  A comprehensive comparison of multiple sequence alignment programs , 1999, Nucleic Acids Res..