Bayesian sampling of evolutionarily conserved RNA secondary structures with pseudoknots

MOTIVATION Today many non-coding RNAs are known to play an active role in various important biological processes. Since RNA's functionality is correlated with specific structural motifs that are often conserved in phylogenetically related molecules, computational prediction of RNA structure should ideally be based on a set of homologous primary structures. But many available RNA secondary structure prediction programs that use sequence alignments do not consider pseudoknots or their estimations consist on a single structure without information on uncertainty. RESULTS In this article we present a method that takes advantage of the evolutionary history of a group of aligned RNA sequences for sampling consensus secondary structures, including pseudoknots, according to their approximate posterior probability. We investigate the benefit of using evolutionary history and demonstrate the competitiveness of our method compared with similar methods based on RNase P RNA sequences and simulated data. AVAILABILITY PhyloQFold, a C + + implementation of our method, is freely available from http://evol.bio.lmu.de/_statgen/software/phyloqfold/.

[1]  R. Breaker,et al.  The structural and functional diversity of metabolite-binding riboswitches. , 2009, Annual review of biochemistry.

[2]  G. Stormo,et al.  Discovering common stem-loop motifs in unaligned RNA sequences. , 2001, Nucleic acids research.

[3]  W. Wheeler,et al.  Paired sequence difference in ribosomal RNAs: evolutionary and phylogenetic implications. , 1988, Molecular biology and evolution.

[4]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[5]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[6]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[7]  O. François,et al.  Approximate Bayesian Computation (ABC) in practice. , 2010, Trends in ecology & evolution.

[8]  Zhiyong Wang,et al.  FlexStem: improving predictions of RNA secondary structures with pseudoknots by reducing the search space , 2008, Bioinform..

[9]  J. Bujnicki,et al.  CompaRNA: a server for continuous benchmarking of automated methods for RNA secondary structure prediction , 2014, Nucleic acids research.

[10]  H. Schwalbe,et al.  NMR Spectroscopy of RNA , 2003, Chembiochem : a European journal of chemical biology.

[11]  Bjarne Knudsen,et al.  Pfold: RNA Secondary Structure Prediction Using Stochastic Context-Free Grammars , 2003 .

[12]  D. W. Staple,et al.  Open access, freely available online Primer Pseudoknots: RNA Structures with Diverse Functions , 2022 .

[13]  Yann Ponty,et al.  VARNA: Interactive drawing and editing of the RNA secondary structure , 2009, Bioinform..

[14]  B. Shapiro,et al.  RNA secondary structure prediction from sequence alignments using a network of k-nearest neighbor classifiers. , 2006, RNA.

[15]  Bjarne Knudsen,et al.  RNA secondary structure prediction using stochastic context-free grammars and evolutionary history , 1999, Bioinform..

[16]  John Cocke,et al.  Programming languages and their compilers: Preliminary notes , 1969 .

[17]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[18]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[19]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[20]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[21]  Beth Israel,et al.  Decision letter: Replication Study: A coding-independent function of gene and pseudogene mRNAs regulates tumour biology , 2010 .

[22]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[23]  Jonathan P. Bollback,et al.  Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology , 2001, Science.

[24]  Acidsby,et al.  Biological Sequence Analysis : Probabilistic Models of Proteins andNucleic , 2009 .

[25]  Sebastian Will,et al.  RNAalifold: improved consensus structure prediction for RNA alignments , 2008, BMC Bioinformatics.

[26]  A. von Haeseler,et al.  A stochastic model for the evolution of autocorrelated DNA sequences. , 1994, Molecular phylogenetics and evolution.

[27]  Dirk Metzler,et al.  Predicting RNA secondary structures with pseudoknots by MCMC sampling , 2007, Journal of mathematical biology.

[28]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[29]  D. Higgins,et al.  R-Coffee: a method for multiple alignment of non-coding RNA , 2008, Nucleic acids research.

[30]  E Rivas,et al.  A dynamic programming algorithm for RNA structure prediction including pseudoknots. , 1998, Journal of molecular biology.

[31]  Weixiong Zhang,et al.  An Iterated loop matching approach to the prediction of RNA secondary structures with pseudoknots , 2004, Bioinform..

[32]  Vladimir Solmon,et al.  The estimation of stochastic context-free grammars using the Inside-Outside algorithm , 2003 .

[33]  Robert Giegerich,et al.  A comprehensive comparison of comparative RNA structure prediction approaches , 2004, BMC Bioinformatics.

[34]  Sean R Eddy,et al.  What is Bayesian statistics? , 2004, Nature Biotechnology.

[35]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[36]  N. Larsen,et al.  Kinship in the SRP RNA family , 2009, RNA biology.

[37]  Tadao Kasami,et al.  An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages , 1965 .

[38]  Tatsuya Akutsu,et al.  IPknot: fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming , 2011, Bioinform..

[39]  M. Lavine What is Bayesian statistics and why everything else is wrong , 2006 .

[40]  P. Stadler,et al.  Secondary structure prediction for aligned RNA sequences. , 2002, Journal of molecular biology.

[41]  I. Tinoco,et al.  RNA folding and unfolding. , 2004, Current opinion in structural biology.

[42]  Robert Giegerich,et al.  Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics , 2004, BMC Bioinformatics.

[43]  J. Harris,et al.  New insight into RNase P RNA structure from comparative analysis of the archaeal RNA. , 2001, RNA.

[44]  Danny Reinberg,et al.  Molecular Signals of Epigenetic States , 2010, Science.

[45]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[46]  James W. Brown,et al.  The Ribonuclease P Database , 1994, Nucleic Acids Res..

[47]  István Miklós,et al.  SimulFold: Simultaneously Inferring RNA Structures Including Pseudoknots, Alignments, and Trees Using a Bayesian MCMC Framework , 2007, PLoS Comput. Biol..

[48]  J. Rougemont,et al.  A rapid bootstrap algorithm for the RAxML Web servers. , 2008, Systematic biology.