Genetic algorithms and extraction of rules for detection of short DNA motifs

The paper presents a method for discovery of speciÞc types of rules related to detection and extraction of explicit potentially biologically active DNA motifs from nucleotide databases. The characteristic of these rules is that they represent a relation of the strengths of signals of two motifs and their mutual distance. The rule extraction is based on a genetic algorithm. The method is applied and tested in the extraction of explicit rules that govern the relationship of the TATA-box motifs in eukaryotes, the signal that relates to the [−40,+11] region relative to the transcription start site (TSS) of eukaryotic promoters, and the distance of the TATA motif and TSS. A very good discrimination ability of the extracted rules in separation of the ’presumed biologically functional’ TATA motifs and de-facto non-functional (pseudo) TATA motifs is demonstrated.

[1]  Nelson F. F. Ebecken,et al.  A clustering algorithm for extracting rules from supervised neural network models in data mining tasks , 2000, Int. J. Comput. Syst. Signals.

[2]  R Nussinov,et al.  Sequence signals in eukaryotic upstream regions. , 1986, Biochimica et biophysica acta.

[3]  G. Stormo Computer methods for analyzing sequence recognition of nucleic acids. , 1988, Annual Review of Biophysics and Biophysical Chemistry.

[4]  P. Bucher Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. , 1990, Journal of molecular biology.

[5]  K. Struhl,et al.  A wide variety of DNA sequences can functionally replace a yeast TATA element for transcriptional activation. , 1990, Genes & development.

[6]  Jean-Michel Claverie,et al.  Detection of Eukaryotic Promoters Using Markov Transition Matrices , 1997, Comput. Chem..

[7]  F E Penotti,et al.  Human DNA TATA boxes and transcription initiation sites. A statistical study. , 1990, Journal of molecular biology.

[8]  A. Roy,et al.  Core promoters and transcriptional control. , 1996, Trends in genetics : TIG.

[9]  J. Fickett,et al.  Eukaryotic promoter recognition. , 1997, Genome research.

[10]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[11]  R Staden Computer methods to locate signals in nucleic acid sequences , 1984, Nucleic Acids Res..

[12]  Pierre Baldi,et al.  The Biology of Eukaryotic Promoter Prediction - A Review , 1999, Comput. Chem..

[13]  K. Struhl,et al.  Mechanisms of transcriptional activation in vivo: two steps forward. , 1996, Trends in genetics : TIG.

[14]  G. B. Hutchinson,et al.  The prediction of vertebrate promoter regions using differential hexamer frequency analysis , 1996, Comput. Appl. Biosci..

[15]  A. O'Shea-Greenfield,et al.  Roles of TATA and initiator elements in determining the start site location and direction of RNA polymerase II transcription. , 1992, The Journal of biological chemistry.

[16]  Gary D. Stormo,et al.  PromFD 1.0: a computer program that predicts eukaryotic pol II promoters using strings and IMD matrices , 1997, Comput. Appl. Biosci..

[17]  Elmar Nöth,et al.  Interpolated markov chains for eukaryotic promoter recognition , 1999, Bioinform..

[18]  Rodger Staden,et al.  Methods to define and locate patterns of motifs in sequences , 1988, Comput. Appl. Biosci..

[19]  Mitsuo Gen,et al.  Genetic algorithms and engineering design , 1997 .

[20]  Ting-Yu Chen,et al.  IMPROVEMENTS OF SIMPLE GENETIC ALGORITHM IN STRUCTURAL DESIGN , 1997 .

[21]  M. G. Reese,et al.  NOVEL NEURAL NETWORK PREDICTION SYSTEMS FOR HUMAN PROMOTERS AND SPLICE SITES , 1995 .

[22]  K. Struhl,et al.  Yeast and human TATA-binding proteins have nearly identical DNA sequence requirements for transcription in vitro , 1990, Molecular and cellular biology.

[23]  C Benoist,et al.  The ovalbumin gene-sequence of putative control regions , 1980, Nucleic Acids Res..

[24]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[25]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[26]  T. D. Schneider,et al.  Use of the 'Perceptron' algorithm to distinguish translational initiation sites in E. coli. , 1982, Nucleic acids research.

[27]  D. S. Prestridge Computer software for eukaryotic promoter analysis. , 2000, Methods in molecular biology.

[28]  R. Roeder,et al.  The role of general initiation factors in transcription by RNA polymerase II. , 1996, Trends in biochemical sciences.

[29]  Sun-Yuan Kung,et al.  Digital neural networks , 1993, Prentice Hall Information and System Sciences Series.

[30]  M A Andrade,et al.  Bioinformatics: from genome data to biological knowledge. , 1997, Current opinion in biotechnology.

[31]  P. Sharp,et al.  Yeast TATA-binding protein TFIID binds to TATA elements with both consensus and nonconsensus DNA sequences. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[32]  S. Smale,et al.  DNA sequence requirements for transcriptional initiator activity in mammalian cells. , 1994, Molecular and cellular biology.

[33]  J. Claverie,et al.  From bioinformatics to computational biology. , 2000, Genome research.

[34]  Philipp Bucher,et al.  The Eukaryotic Promoter Database (EPD): recent developments , 1999, Nucleic Acids Res..

[35]  R. Harr,et al.  Search algorithm for pattern match analysis of nucleic acid sequences. , 1983, Nucleic acids research.