Recent Methods for RNA Modeling Using Stochastic Context-Free Grammars

Stochastic context-free grammars (SCFGs) can be applied to the problems of folding, aligning and modeling families of homologous RNA sequences. SCFGs capture the sequences' common primary and secondary structure and generalize the hidden Markov models (HMMs) used in related work on protein and DNA. This paper discusses our new algorithm, Tree-Grammar EM, for deducing SCFG parameters automatically from unaligned, unfolded training sequences. Tree-Grammar EM, a generalization of the HMM forward-backward algorithm, is based on tree grammars and is faster than the previously proposed inside-outside SCFG training algorithm. Independently, Sean Eddy and Richard Durbin have introduced a trainable “covariance model” (CM) to perform similar tasks. We compare and contrast our methods with theirs.

[1]  I. Tinoco,et al.  Estimation of Secondary Structure in Ribonucleic Acids , 1971, Nature.

[2]  Alfred V. Aho,et al.  The Theory of Parsing, Translation, and Compiling , 1972 .

[3]  M. Fischer,et al.  STRING-MATCHING AND OTHER PRODUCTS , 1974 .

[4]  C. Woese,et al.  5S RNA secondary structure , 1975, Nature.

[5]  Jerrold R. Griggs,et al.  Algorithms for Loop Matchings , 1978 .

[6]  J. Baker Trainable grammars for speech recognition , 1979 .

[7]  R. Gutell,et al.  Detailed analysis of the higher-order structure of 16S-like ribosomal ribonucleic acids. , 1983, Microbiological reviews.

[8]  O. Firschein,et al.  Syntactic pattern recognition and applications , 1983, Proceedings of the IEEE.

[9]  Wolfram Saenger,et al.  Principles of Nucleic Acid Structure , 1983 .

[10]  西村 善文 W. Saenger: Principles of Nucleic Acid Structure, Springer-Verlag, New York and Berlin, 1984, xx+556ページ, 24.5×16.5cm, 14,160円 (Springer Advanced Texts in Chemistry). , 1985 .

[11]  D. Sankoff Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems , 1985 .

[12]  T. Pollard,et al.  Annual review of biophysics and biophysical chemistry , 1985 .

[13]  Karl R. Abrahamson Generalized String Matching , 1987, SIAM J. Comput..

[14]  M. Bishop,et al.  Nucleic acid and protein sequence analysis : a practical approach , 1987 .

[15]  C. Guthrie,et al.  Spliceosomal snRNAs. , 1988, Annual review of genetics.

[16]  D. Turner,et al.  RNA structure prediction. , 1988, Annual review of biophysics and biophysical chemistry.

[17]  M. Waterman Computer analysis of nucleic acid sequences. , 1988, Methods in enzymology.

[18]  M. Waterman [52] Computer analysis of nucleic acid sequences , 1988 .

[19]  M. Zuker On finding all suboptimal foldings of an RNA molecule. , 1989, Science.

[20]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[21]  I. Tinoco,et al.  RNA folding: Pseudoknots, loops and bulges , 1989, BioEssays : news and reviews in molecular, cellular and developmental biology.

[22]  C. Zwieb Structure and function of signal recognition particle RNA. , 1989, Progress in nucleic acid research and molecular biology.

[23]  D. G. Simpson,et al.  The Statistical Analysis of Discrete Data , 1989 .

[24]  K. Umesono,et al.  Comparative and functional anatomy of group II catalytic introns--a review. , 1989, Gene.

[25]  E. Westhof,et al.  Modelling of the three-dimensional architecture of group I catalytic introns based on comparative sequence analysis. , 1990, Journal of molecular biology.

[26]  Kaizhong Zhang,et al.  Comparing multiple RNA secondary structures using tree comparisons , 1990, Comput. Appl. Biosci..

[27]  Ross A. Overbeek,et al.  Structure detection through automated covariance search , 1990, Comput. Appl. Biosci..

[28]  Joost Engelfriet,et al.  Graph Grammars Based on Node Rewriting: An Introduction to NLC Graph Grammars , 1990, Graph-Grammars and Their Application to Computer Science.

[29]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[30]  J. Szostak,et al.  Phylogenetic and genetic evidence for base-triples in the catalytic domain of group I introns , 1990, Nature.

[31]  Amihood Amir,et al.  Efficient 2-dimensional approximate matching of non-rectangular figures , 1991, SODA '91.

[32]  J. Liu,et al.  Phylogenetic analysis and evolution of RNase P RNA in proteobacteria , 1991, Journal of bacteriology.

[33]  Yasubumi Sakakibara,et al.  Efficient Learning of Context-Free Grammars from Positive Structural Examples , 1992, Inf. Comput..

[34]  David Eppstein,et al.  Sparse dynamic programming II: convex and concave cost functions , 1992, JACM.

[35]  David Eppstein,et al.  Sparse dynamic programming I: linear cost functions , 1992, JACM.

[36]  G. Stormo,et al.  Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods. , 1992, Nucleic acids research.

[37]  David B. Searls,et al.  The Linguistics of DNA , 1992 .

[38]  Ross A. Overbeek,et al.  The ribosomal database project , 1992, Nucleic Acids Res..

[39]  A. Tranguch,et al.  Comparative structural analysis of nuclear RNase P RNAs from yeast. , 1993, The Journal of biological chemistry.

[40]  D. Haussler,et al.  Protein modeling using hidden Markov models: analysis of globins , 1993, [1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences.

[41]  David Haussler,et al.  Using Dirichlet Mixture Priors to Derive Hidden Markov Models for Protein Families , 1993, ISMB.

[42]  D. Searls,et al.  A syntactic pattern recognition system for DNA sequences , 1993 .

[43]  David B. Searls,et al.  The computational linguistics of biological sequences , 1993, ISMB 1995.

[44]  Douglas L. Brutlag,et al.  Detection of Correlations in tRNA Sequences with Structural Implications , 1993, ISMB.

[45]  R. C. Underwood,et al.  THE APPLICATION OF STOCHASTIC CONTEXT-FREE GRAMMARS TO FOLDING, ALIGNING AND MODELING HOMOLOGOUS RNA SEQUENCES , 1993 .

[46]  D. Haussler,et al.  Stochastic context-free grammars for modeling RNA , 1993, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[47]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[48]  David Haussler,et al.  RNA Modeling Using Gibbs Sampling and Stochastic Context Free Grammars , 1994, ISMB.

[49]  Heikki Mannila,et al.  Query Primitives for Tree-Structured Data , 1994, CPM.

[50]  Richard H. Lathrop,et al.  A branch-and-bound algorithm for optimal protein threading with pairwise (contact potential) amino acid interactions , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[51]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[52]  Tao Jiang,et al.  Alignment of Trees - An Alternative to Tree Edit , 1994, Theor. Comput. Sci..

[53]  David B. Searls,et al.  String Variable Grammar: A Logic Grammar Formalism for the Biological Language of DNA , 1995, J. Log. Program..