The search for a grammatical theory of gene regulation is formally justified by showing the inadequacy of context-free grammars

No one questions the important practical contributions of computer sciences to molecular biology. It may well be that one day theoretical contributions also will become useful. One example of this type of interdisciplinary research is the attempt to construct a grammatical theory of the regulation of gene expression. In this paper, I demonstrate that context-free grammars are inadequate for the description of regulatory properties coded in the DNA. This result is supported by data available in the literature that show changes in the specificity of the recognition between regulatory proteins and their DNA targets. This result is an important limitation for the use of statistical approaches such as information theory as a source of inspiration for a theory of gene regulation. Additionally, such a demonstration gives formal justification to the search for more elaborate grammatical models in the study of gene regulation. Some basic proposals for such grammatical approach have been presented previously.

[1]  W. Doolittle,et al.  Selfish genes, the phenotype paradigm and genome evolution , 1980, Nature.

[2]  J. Collado-Vides,et al.  Control site location and transcriptional regulation in Escherichia coli , 1991, Microbiological reviews.

[3]  Mark Ptashne,et al.  A new-specificity mutant of 434 repressor that defines an amino acid–base pair contact , 1987, Nature.

[4]  Pascale Cossart,et al.  Mutations that alter the DNA sequence specificity of the catabolite gene activator protein of E. coli , 1984, Nature.

[5]  T. Head Formal language theory and DNA: an analysis of the generative capacity of specific recombinant behaviors. , 1987, Bulletin of mathematical biology.

[6]  J. Collado-Vides,et al.  A transformational-grammar approach to the study of the regulation of gene expression. , 1989, Journal of theoretical biology.

[7]  N. Lehming,et al.  lac repressor mutants with double or triple exchanges in the recognition helix bind specifically to lac operator variants with multiple exchanges. , 1989, The EMBO journal.

[8]  R. Doolittle,et al.  Homology among DNA-binding proteins suggests use of a conserved super-secondary structure , 1982, Nature.

[9]  Philip Youderian,et al.  Changing the DNA-binding specificity of a repressor , 1983, Cell.

[10]  J. Monod,et al.  Genetic regulatory mechanisms in the synthesis of proteins. , 1961, Journal of Molecular Biology.

[11]  Hans R. Schöler,et al.  Specific interaction between enhancer-containing molecules and cellular components , 1984, Cell.

[12]  M. Chamberlin,et al.  Structure and function of bacterial sigma factors. , 1988, Annual review of biochemistry.

[13]  M Buck,et al.  Mutational analysis of upstream sequences required for transcriptional activation of the Klebsiella pneumoniae nifH promoter. , 1987, Nucleic acids research.

[14]  S. Busby,et al.  Cyclic AMP-dependent constitutive expression of gal operon: use of repressor titration to isolate operator mutations. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Noam Chomsky,et al.  On Certain Formal Properties of Grammars , 1959, Inf. Control..

[16]  Paolo Sassone-Corsi,et al.  A trans-acting factor is responsible for the simian virus 40 enhancer activity in vitro , 1985, Nature.

[17]  J. Collado-Vides,et al.  A syntactic representation of units of genetic information--a syntax of units of genetic information. , 1991, Journal of theoretical biology.

[18]  C. Aslanidis,et al.  Regulatory elements of the raffinose operon: nucleotide sequences of operator and repressor genes , 1990, Journal of bacteriology.

[19]  Lila L. Gatlin,et al.  Information theory and the living system , 1972 .

[20]  D. Hamer,et al.  Competition for cellular factors that activate metallothionein gene transcription , 1984, Nature.

[21]  G. Stormo,et al.  Identifying protein-binding sites from unaligned DNA fragments. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[22]  N. D. Clarke,et al.  Identification of protein folds: Matching hydrophobicity patterns of sequence sets with solvent accessibility patterns of known structures , 1990, Proteins.

[23]  V. Brendel,et al.  Genome structure described by formal languages. , 1984, Nucleic acids research.

[24]  Richard E. Dickerson,et al.  Synthetic lacoperator DNA is functional in vivo , 1976, Nature.