RNA Sampler: a new sampling based algorithm for common RNA secondary structure prediction and structural alignment

MOTIVATION Non-coding RNA genes and RNA structural regulatory motifs play important roles in gene regulation and other cellular functions. They are often characterized by specific secondary structures that are critical to their functions and are often conserved in phylogenetically or functionally related sequences. Predicting common RNA secondary structures in multiple unaligned sequences remains a challenge in bioinformatics research. METHODS AND RESULTS We present a new sampling based algorithm to predict common RNA secondary structures in multiple unaligned sequences. Our algorithm finds the common structure between two sequences by probabilistically sampling aligned stems based on stem conservation calculated from intrasequence base pairing probabilities and intersequence base alignment probabilities. It iteratively updates these probabilities based on sampled structures and subsequently recalculates stem conservation using the updated probabilities. The iterative process terminates upon convergence of the sampled structures. We extend the algorithm to multiple sequences by a consistency-based method, which iteratively incorporates and reinforces consistent structure information from pairwise comparisons into consensus structures. The algorithm has no limitation on predicting pseudoknots. In extensive testing on real sequence data, our algorithm outperformed other leading RNA structure prediction methods in both sensitivity and specificity with a reasonably fast speed. It also generated better structural alignments than other programs in sequences of a wide range of identities, which more accurately represent the RNA secondary structure conservations. AVAILABILITY The algorithm is implemented in a C program, RNA Sampler, which is available at http://ural.wustl.edu/software.html

[1]  G. Stormo,et al.  Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods. , 1992, Nucleic acids research.

[2]  P. Stadler,et al.  Secondary structure prediction for aligned RNA sequences. , 2002, Journal of molecular biology.

[3]  J. McCaskill The equilibrium partition function and base pair binding probabilities for RNA secondary structure , 1990, Biopolymers.

[4]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[5]  Gary D. Stormo,et al.  An RNA folding method capable of identifying pseudoknots and base triples , 1998, Bioinform..

[6]  Peter F. Stadler,et al.  Stochastic pairwise alignments , 2002, ECCB.

[7]  Peter F. Stadler,et al.  Alignment of RNA base pairing probability matrices , 2004, Bioinform..

[8]  Michael Zuker,et al.  RNA Secondary Structure Prediction , 2007, Current protocols in nucleic acid chemistry.

[9]  Chuong B. Do,et al.  ProbCons: Probabilistic consistency-based multiple sequence alignment. , 2005, Genome research.

[10]  Bjarne Knudsen,et al.  RNA secondary structure prediction using stochastic context-free grammars and evolutionary history , 1999, Bioinform..

[11]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.

[12]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[13]  Rolf Backofen,et al.  Backofen R: MARNA: multiple alignment and consensus structure prediction of RNAs based on sequence structure comparisons , 2005 .

[14]  Robert Giegerich,et al.  Abstract shapes of RNA. , 2004, Nucleic acids research.

[15]  Hélène Touzet,et al.  CARNAC: folding families of related RNAs , 2004, Nucleic Acids Res..

[16]  Gary D. Stormo,et al.  Graph-Theoretic Approach to RNA Modeling Using Comparative Data , 1995, ISMB.

[17]  Gary D. Stormo,et al.  Do mRNAs act as direct sensors of small molecules to control their expression? , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Serafim Batzoglou,et al.  CONTRAfold: RNA secondary structure prediction without physics-based models , 2006, ISMB.

[19]  Kaizhong Zhang,et al.  RNA Secondary Structure Prediction Via Energy Density Minimization , 2006, RECOMB.

[20]  Vineet Bafna,et al.  Consensus Folding of Unaligned RNA Sequences Revisited , 2006, J. Comput. Biol..

[21]  Bjarne Knudsen,et al.  Pfold: RNA Secondary Structure Prediction Using Stochastic Context-Free Grammars , 2003 .

[22]  Sean R. Eddy,et al.  Rfam: annotating non-coding RNAs in complete genomes , 2004, Nucleic Acids Res..

[23]  G. Stormo,et al.  Discovering common stem-loop motifs in unaligned RNA sequences. , 2001, Nucleic acids research.

[24]  Jan Gorodkin,et al.  Multiple structural alignment and clustering of RNA sequences , 2007, Bioinform..

[25]  D. Sankoff Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems , 1985 .

[26]  R. C. Underwood,et al.  Stochastic context-free grammars for tRNA modeling. , 1994, Nucleic acids research.

[27]  S. Miyazawa A reliable sequence alignment method based on probabilities of residue correspondences. , 1995, Protein engineering.

[28]  Xing Xu,et al.  A graph theoretical approach for predicting common RNA secondary structure motifs including pseudoknots in unaligned sequences , 2004, Bioinform..

[29]  Peter F Stadler,et al.  Fast and reliable prediction of noncoding RNAs , 2005, Proc. Natl. Acad. Sci. USA.

[30]  Wade C Winkler,et al.  Riboswitches and the role of noncoding RNAs in bacterial metabolic control. , 2005, Current opinion in chemical biology.

[31]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[32]  Niles A. Pierce,et al.  An algorithm for computing nucleic acid base‐pairing probabilities including pseudoknots , 2004, J. Comput. Chem..

[33]  Ian Holmes,et al.  Pairwise RNA Structure Comparison with Stochastic Context-Free Grammars , 2001, Pacific Symposium on Biocomputing.

[34]  M. Zuker Prediction of RNA secondary structure by energy minimization. , 1994, Methods in molecular biology.

[35]  Ian Holmes,et al.  Stem Stem Stem Stem Loop Loop Loop LoopLoop Loop Loop Loop Loop Loop Loop , 2005 .

[36]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[37]  S. Eddy Non–coding RNA genes and the modern RNA world , 2001, Nature Reviews Genetics.

[38]  D. Turner,et al.  Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. , 2002, Journal of molecular biology.

[39]  A. Wilm,et al.  A benchmark of multiple sequence alignment programs upon structural RNAs , 2005, Nucleic acids research.

[40]  G. Stormo,et al.  A graph theoretical approach for predicting common RNA secondary structure motifs including pseudoknots in unaligned sequences. , 2004, Bioinformatics.

[41]  Robert Giegerich,et al.  RNAshapes: an integrated RNA analysis package based on abstract shapes. , 2006, Bioinformatics.

[42]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[43]  Olivier Poch,et al.  A comprehensive comparison of multiple sequence alignment programs , 1999, Nucleic Acids Res..

[44]  Sergey Steinberg,et al.  Compilation of tRNA sequences and sequences of tRNA genes , 2004, Nucleic Acids Res..

[45]  E Rivas,et al.  A dynamic programming algorithm for RNA structure prediction including pseudoknots. , 1998, Journal of molecular biology.

[46]  Ching Wai Tan,et al.  Secondary structure prediction , 2005 .