Grammatical inference in bioinformatics

Bioinformatics is an active research area aimed at developing intelligent systems for analyses of molecular biology. Many methods based on formal language theory, statistical theory, and learning theory have been developed for modeling and analyzing biological sequences such as DNA, RNA, and proteins. Especially, grammatical inference methods are expected to find some grammatical structures hidden in biological sequences. In this article, we give an overview of a series of our grammatical approaches to biological sequence analyses and related researches and focus on learning stochastic grammars from biological sequences and predicting their functions based on learned stochastic grammars.

[1]  Alfred V. Aho,et al.  The Theory of Parsing, Translation, and Compiling , 1972 .

[2]  DANA ANGLUIN,et al.  On the Complexity of Minimum Inference of Regular Sets , 1978, Inf. Control..

[3]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[4]  J. Baker Trainable grammars for speech recognition , 1979 .

[5]  Dana Angluin,et al.  Inference of Reversible Languages , 1982, JACM.

[6]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[7]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[8]  Fernando Pereira,et al.  Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, HLT.

[9]  Yasubumi Sakakibara,et al.  Efficient Learning of Context-Free Grammars from Positive Structural Examples , 1992, Inf. Comput..

[10]  F. Pereira,et al.  Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, ACL.

[11]  Sergey Steinberg,et al.  Compilation of tRNA sequences and sequences of tRNA genes , 2004, Nucleic Acids Res..

[12]  Andreas Stolcke,et al.  Inducing Probabilistic Grammars by Bayesian Model Merging , 1994, ICGI.

[13]  R. C. Underwood,et al.  Stochastic context-free grammars for tRNA modeling. , 1994, Nucleic acids research.

[14]  Enrique Vidal,et al.  What Is the Search Space of the Regular Inference? , 1994, ICGI.

[15]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[16]  Naoki Abe,et al.  A New Method for Predicting Protein Secondary Structures Based on Stochastic Tree Grammars , 1994, ICML.

[17]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[18]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[19]  David B. Searls,et al.  Automata-Theoretic Models of Mutation and Alignment , 1995, ISMB.

[20]  Tao Jiang,et al.  Alignment of Trees - An Alternative to Tree Edit , 1994, Theor. Comput. Sci..

[21]  Yasubumi Sakakibara,et al.  Recent Advances of Grammatical Inference , 1997, Theor. Comput. Sci..

[22]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[23]  Satoshi Kobayashi,et al.  Tree Adjoining Grammars for RNA Structure Prediction , 1999, Theor. Comput. Sci..

[24]  Yasubumi Sakakibara,et al.  GA-based Learning of Context-Free Grammars using Tabular Representations , 1999, International Conference on Machine Learning.

[25]  Philip M. Long,et al.  Associative Reinforcement Learning using Linear Probabilistic Concepts , 1999, ICML.

[26]  Elena Rivas,et al.  The language of RNA: a formal grammar that includes pseudoknots , 2000, Bioinform..

[27]  Elena Rivas,et al.  Noncoding RNA gene detection using comparative sequence analysis , 2001, BMC Bioinformatics.

[28]  Simon Cawley,et al.  Applications of generalized pair hidden Markov models to alignment and gene finding problems , 2001, J. Comput. Biol..

[29]  Ashwin Srinivasan,et al.  Are Grammatical Representations Useful for Learning from Biological Sequence Data? - A Case Study , 2001, J. Comput. Biol..

[30]  D. Searls,et al.  Robots in invertebrate neuroscience , 2002, Nature.

[31]  Ian Holmes,et al.  Pairwise RNA Structure Comparison with Stochastic Context-Free Grammars , 2001, Pacific Symposium on Biocomputing.

[32]  Yasubumi Sakakibara,et al.  Pair hidden Markov models on tree structures , 2003, ISMB.

[33]  Russell L. Malmberg,et al.  Stochastic modeling of RNA pseudoknotted structures: a grammatical approach , 2003, ISMB.

[34]  T. Speed,et al.  Biological Sequence Analysis , 1998 .

[35]  Vladimir Solmon,et al.  The estimation of stochastic context-free grammars using the Inside-Outside algorithm , 2003 .

[36]  Hiroshi Matsui,et al.  Pair stochastic tree adjoining grammars for aligning and predicting pseudoknot RNA structures , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[37]  Naoki Abe,et al.  On the computational complexity of approximating distributions by probabilistic automata , 1990, Machine Learning.

[38]  Mathias Sprinzl,et al.  Compilation of tRNA sequences and sequences of tRNA genes , 1993, Nucleic Acids Res..

[39]  Yasubumi Sakakibara,et al.  Learning context-free grammars using tabular representations , 2005, Pattern Recognit..