Grammatical inference applied to linguistic modeling of biological regulation networks

We present a methodology based on grammatical inference algorithms applied to the linguistic modeling of biological regulation networks. The linguistic approach to the problem of regulation networks was proposed by COLLADOVIDES, who proved and formalized the need for use of context sensitive languages to represent such networks. The learning of context sensitive languages is a difficult task, our proposed methodology describes this class from language with a simpler nature that can be learned by already consolidated grammars inference algorithms. In addition to the proposed methodology, we suggest promising directions for this research.

[1]  Ingo Br,et al.  Prolog programming for artificial intelligence , 1990 .

[2]  Yoram Singer,et al.  Context-sensitive learning methods for text categorization , 1996, SIGIR '96.

[3]  Alberto Sanfeliu,et al.  Recognition and learning of a class of context-sensitive languages described by augmented regular expressions , 1997, Pattern Recognit..

[4]  Denis Thieffry,et al.  RegulonDB: a database on transcriptional regulation in Escherichia coli , 1998, Nucleic Acids Res..

[5]  Noam Chomsky,et al.  On Certain Formal Properties of Grammars , 1959, Inf. Control..

[6]  J. Collado-Vides,et al.  The elements for a classification of units of genetic information with a combinatorial component. , 1993, Journal of theoretical biology.

[7]  J. Collado-Vides,et al.  A linguistic representation of the regulation of transcription initiation. I. An ordered array of complex symbols with distinctive features. , 1993, Bio Systems.

[8]  Julio Collado-Vides,et al.  The search for a grammatical theory of gene regulation is formally justified by showing the inadequacy of context-free grammars , 1991, Comput. Appl. Biosci..

[9]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[10]  J. Collado-Vides,et al.  A linguistic representation of the regulation of transcription initiation. II. Distinctive features of sigma 70 promoters and their regulatory binding sites. , 1993, Bio Systems.

[11]  J. Collado-Vides,et al.  Grammatical model of the regulation of gene expression. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[13]  Gary D. Stormo,et al.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences , 1999, Bioinform..

[14]  J. Collado-Vides,et al.  A transformational-grammar approach to the study of the regulation of gene expression. , 1989, Journal of theoretical biology.

[15]  Denis Thieffry,et al.  Syntactic recognition of regulatory regions in Escherichia coli , 1996, Comput. Appl. Biosci..

[16]  A. Sanfeliu,et al.  Augmented regular expressions: a formalism to describe, recognize, and learn a class of context-sensitive languages , 1995 .

[17]  SingerYoram,et al.  Context-sensitive learning methods for text categorization , 1999 .

[18]  O. Firschein,et al.  Syntactic pattern recognition and applications , 1983, Proceedings of the IEEE.