Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints

BackgroundWe are interested in the problem of predicting secondary structure for small sets of homologous RNAs, by incorporating limited comparative sequence information into an RNA folding model. The Sankoff algorithm for simultaneous RNA folding and alignment is a basis for approaches to this problem. There are two open problems in applying a Sankoff algorithm: development of a good unified scoring system for alignment and folding and development of practical heuristics for dealing with the computational complexity of the algorithm.ResultsWe use probabilistic models (pair stochastic context-free grammars, pairSCFGs) as a unifying framework for scoring pairwise alignment and folding. A constrained version of the pairSCFG structural alignment algorithm was developed which assumes knowledge of a few confidently aligned positions (pins). These pins are selected based on the posterior probabilities of a probabilistic pairwise sequence alignment.ConclusionPairwise RNA structural alignment improves on structure prediction accuracy relative to single sequence folding. Constraining on alignment is a straightforward method of reducing the runtime and memory requirements of the algorithm. Five practical implementations of the pairwise Sankoff algorithm – this work (Consan), David Mathews' Dynalign, Ian Holmes' Stemloc, Ivo Hofacker's PMcomp, and Jan Gorodkin's FOLDALIGN – have comparable overall performance with different strengths and weaknesses.

[1]  Gary D. Stormo,et al.  An RNA folding method capable of identifying pseudoknots and base triples , 1998, Bioinform..

[2]  N. Pace,et al.  Phylogenetic comparative analysis and the secondary structure of ribonuclease P RNA--a review. , 1989, Gene.

[3]  Peter F. Stadler,et al.  Alignment of RNA base pairing probability matrices , 2004, Bioinform..

[4]  R. Lück,et al.  ConStruct: a tool for thermodynamic controlled prediction of conserved secondary structure. , 1999, Nucleic acids research.

[5]  Ian Holmes,et al.  Pairwise RNA Structure Comparison with Stochastic Context-Free Grammars , 2001, Pacific Symposium on Biocomputing.

[6]  Burkhard Morgenstern,et al.  Exon discovery by genomic sequence alignment , 2002, Bioinform..

[7]  R. Gutell,et al.  Lessons from an evolving rRNA: 16S and 23S rRNA structures from a comparative perspective. , 1994, Microbiological reviews.

[8]  Steven E. Brenner,et al.  Bootstrapping and normalization for enhanced evaluations of pairwise sequence comparison , 2002, Proc. IEEE.

[9]  Robert Giegerich,et al.  Explaining and Controlling Ambiguity in Dynamic Programming , 2000, CPM.

[10]  J. Steitz,et al.  The expanding universe of noncoding RNAs. , 2006, Cold Spring Harbor symposia on quantitative biology.

[11]  M. Zuker Calculating nucleic acid secondary structure. , 2000, Current opinion in structural biology.

[12]  M. Waterman,et al.  RNA secondary structure: a complete mathematical analysis , 1978 .

[13]  D. S. Fields,et al.  An analysis of large rRNA sequences folded by a thermodynamic method. , 1996, Folding & design.

[14]  R. Gutell,et al.  The accuracy of ribosomal RNA comparative structure models. , 2002, Current opinion in structural biology.

[15]  Gary D. Stormo,et al.  Finding Common Sequence and Structure Motifs in a Set of RNA Sequences , 1997, ISMB.

[16]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[17]  Gary D. Stormo,et al.  Phylogenetically enhanced statistical tools for RNA structure prediction , 2000, Bioinform..

[18]  S. Eddy Non–coding RNA genes and the modern RNA world , 2001, Nature Reviews Genetics.

[19]  Hélène Touzet,et al.  Finding the common structure shared by two homologous RNAs , 2003, Bioinform..

[20]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[21]  Bjarne Knudsen,et al.  RNA secondary structure prediction using stochastic context-free grammars and evolutionary history , 1999, Bioinform..

[22]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[23]  David H. Mathews,et al.  Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change , 2006, BMC Bioinformatics.

[24]  David H. Mathews,et al.  Predicting a set of minimal free energy RNA secondary structures common to two sequences , 2005, Bioinform..

[25]  D. Sankoff Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems , 1985 .

[26]  D. Turner,et al.  Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. , 2002, Journal of molecular biology.

[27]  David K. Y. Chiu,et al.  Inferring consensus structure from nucleic acid sequences , 1991, Comput. Appl. Biosci..

[28]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[29]  Sean R. Eddy,et al.  Rna structural alignment using stochastic context-free grammars , 2004 .

[30]  守屋 悦朗,et al.  J.E.Hopcroft, J.D. Ullman 著, "Introduction to Automata Theory, Languages, and Computation", Addison-Wesley, A5変形版, X+418, \6,670, 1979 , 1980 .

[31]  Serafim Batzoglou,et al.  The many faces of sequence alignment , 2005, Briefings Bioinform..

[32]  A. Wilm,et al.  A benchmark of multiple sequence alignment programs upon structural RNAs , 2005, Nucleic acids research.

[33]  G. Stormo,et al.  Discovering common stem-loop motifs in unaligned RNA sequences. , 2001, Nucleic acids research.

[34]  S. Muse Evolutionary analyses of DNA sequences subject to constraints of secondary structure. , 1995, Genetics.

[35]  Elena Rivas,et al.  Noncoding RNA gene detection using comparative sequence analysis , 2001, BMC Bioinformatics.

[36]  Jerrold R. Griggs,et al.  Algorithms for Loop Matchings , 1978 .

[37]  Laurie J. Heyer,et al.  Finding the most significant common sequence and structure motifs in a set of RNA sequences. , 1997, Nucleic acids research.

[38]  Sean R. Eddy,et al.  Rfam: annotating non-coding RNAs in complete genomes , 2004, Nucleic Acids Res..

[39]  Ian H. Holmes,et al.  Studies in Probabilistic Sequence Alignment and Evolution , 1998 .

[40]  Robert Giegerich,et al.  Effective ambiguity checking in biosequence analysis , 2005, BMC Bioinformatics.

[41]  D. Haussler,et al.  Using multiple alignments and phylogenetic trees to detect RNA secondary structure. , 1996, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[42]  Ian Holmes,et al.  Stem Stem Stem Stem Loop Loop Loop LoopLoop Loop Loop Loop Loop Loop Loop , 2005 .

[43]  Sean R. Eddy,et al.  Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction , 2004, BMC Bioinformatics.

[44]  Yves Van de Peer,et al.  The European database on small subunit ribosomal RNA , 2002, Nucleic Acids Res..

[45]  Bjarne Knudsen,et al.  Pfold: RNA Secondary Structure Prediction Using Stochastic Context-Free Grammars , 2003 .

[46]  Yves Van de Peer,et al.  The European Large Subunit Ribosomal RNA database , 2000, Nucleic Acids Res..

[47]  P. Stadler,et al.  Secondary structure prediction for aligned RNA sequences. , 2002, Journal of molecular biology.

[48]  J. Sabina,et al.  Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. , 1999, Journal of molecular biology.

[49]  Gary D. Stormo,et al.  Pairwise local structural alignment of RNA sequences with sequence similarity less than 40% , 2005, Bioinform..

[50]  V. Juan,et al.  RNA secondary structure prediction based on free energy and phylogenetic analysis. , 1999, Journal of molecular biology.

[51]  G. Stormo,et al.  Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods. , 1992, Nucleic acids research.