RNA Modeling Using Gibbs Sampling and Stochastic Context Free Grammars

A new method of discovering the common secondary structure of a family of homologous RNA sequences using Gibbs sampling and stochastic context-free grammars is proposed. Given an unaligned set of sequences, a Gibbs sampling step simultaneously estimates the secondary structure of each sequence and a set of statistical parameters describing the common secondary structure of the set as a whole. These parameters describe a statistical model of the family. After the Gibbs sampling has produced a crude statistical model for the family, this model is translated into a stochastic context-free grammar, which is then refined by an Expectation Maximization (EM) procedure to produce a more complete model. A prototype implementation of the method is tested on tRNA, pieces of 16S rRNA and on U5 snRNA with good results.

[1]  N. Pace,et al.  Phylogenetic comparative analysis of RNA secondary structure. , 1989, Methods in Enzymology.

[2]  D. Sankoff Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems , 1985 .

[3]  D. Haussler,et al.  Stochastic context-free grammars for modeling RNA , 1993, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[4]  John C. Wootton,et al.  A Gibbs sampler for the detection of subtle motifs in multiple sequences , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[5]  Radford M. Neal A new view of the EM algorithm that justifies incremental and other variants , 1993 .

[6]  M. Zuker On finding all suboptimal foldings of an RNA molecule. , 1989, Science.

[7]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[8]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[9]  R. Gutell,et al.  Detailed analysis of the higher-order structure of 16S-like ribosomal ribonucleic acids. , 1983, Microbiological reviews.

[10]  M. A. McClure,et al.  Hidden Markov models of biological primary sequence information. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[11]  David K. Y. Chiu,et al.  Inferring consensus structure from nucleic acid sequences , 1991, Comput. Appl. Biosci..

[12]  Sergey Steinberg,et al.  Compilation of tRNA sequences and sequences of tRNA genes , 2004, Nucleic Acids Res..

[13]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[14]  D. H. Gauss,et al.  Compilation of tRNA sequences. , 1980, Nucleic acids research.

[15]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  J. Wise,et al.  Guides to the heart of the spliceosome. , 1993, Science.

[17]  H. Noller,et al.  Secondary structure of 16S ribosomal RNA. , 1981, Science.

[18]  Ross A. Overbeek,et al.  Structure detection through automated covariance search , 1990, Comput. Appl. Biosci..

[19]  D. Turner,et al.  RNA structure prediction. , 1988, Annual review of biophysics and biophysical chemistry.

[20]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[21]  David B. Searls,et al.  The computational linguistics of biological sequences , 1993, ISMB 1995.

[22]  M. Zuker,et al.  Predicting common foldings of homologous RNAs. , 1991, Journal of biomolecular structure & dynamics.

[23]  Collin M. Stultz,et al.  Protein classification by stochastic modeling and optimal filtering of amino-acid sequences. , 1994, Mathematical Biosciences.

[24]  M. Mckeown,et al.  The role of small nuclear RNAs in RNA splicing. , 1993, Current opinion in cell biology.

[25]  D. Haussler,et al.  A hidden Markov model that finds genes in E. coli DNA. , 1994, Nucleic acids research.

[26]  I. Lapidus,et al.  Secondary structure of 5 S ribosomal RNA. , 1970, Journal of theoretical biology.

[27]  Douglas L. Brutlag,et al.  Detection of Correlations in tRNA Sequences with Structural Implications , 1993, ISMB.

[28]  Ross A. Overbeek,et al.  The ribosomal database project , 1992, Nucleic Acids Res..

[29]  I. Tinoco,et al.  Estimation of Secondary Structure in Ribonucleic Acids , 1971, Nature.

[30]  G. Stormo,et al.  Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods. , 1992, Nucleic acids research.

[31]  R. C. Underwood,et al.  THE APPLICATION OF STOCHASTIC CONTEXT-FREE GRAMMARS TO FOLDING, ALIGNING AND MODELING HOMOLOGOUS RNA SEQUENCES , 1993 .

[32]  K. Han,et al.  Prediction of common folding structures of homologous RNAs. , 1993, Nucleic acids research.

[33]  C. Guthrie,et al.  Spliceosomal snRNAs. , 1988, Annual review of genetics.

[34]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[35]  John R. Nickolls,et al.  The design of the MasPar MP-1: a cost effective massively parallel computer , 1990, Digest of Papers Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage.

[36]  R. Gutell,et al.  Secondary structure model for bacterial 16S ribosomal RNA: phylogenetic, enzymatic and chemical evidence. , 1980, Nucleic acids research.