A Hybrid Promoter Analysis Methodology for Prokaryotic Genomes

One of the big challenges of the post-genomic era is identifying regulatory systems and integrating them into genetic networks. Gene expression is determined by protein–protein interactions among regulatory proteins and with RNA polymerase(s), and protein–DNA interactions of these trans-acting factors withcis-acting DNA sequences in the promoter regions of those regulated genes. Therefore, identifying these protein–DNA interactions, by means of the DNA motifs that characterize the regulatory factors operating in the transcription of a gene, becomes crucial for determining which genes participate in a regulation process, how they behave and how they are connected to build genetic networks. In this paper, we propose a hybrid promoter analysis methodology (HPAM) to discover complex promoter motifs that combines: the neural network efficiency and ability of representing imprecise and incomplete patterns; the flexibility and interpretability of fuzzy models; and the multi-objective evolutionary algorithms capability to identify optimal instances of a model by searching according to multiple criteria. We test our methodology by learning and predicting the RNA polymerase motif in prokaryotic genomes. This constitutes a special challenge due to the multiplicity of the RNA polymerase targets and its connectivity with other transcription factors, which sometimes require multiple functional binding sites even in close located regulatory regions; and the uncertainty ∗ Corresponding author. E-mail addresses:vcotik@dc.uba.ar(V. Cotik), rromero@dc.uba.ar(R. Romero Zaliz), zwir@borcim.wustl.edu , zwir@decsai.ugr.es (I. Zwir). 0165-0114/$ see front matter © 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.fss.2004.10.016 84 V. Cotik et al. / Fuzzy Sets and Systems 152 (2005) 83–102 of its motif, which allows sites with low specificity (i.e., differing from the best alignment or consensus) to still be functional. HPAM is available for public use in http://soar-tools.wustl.edu . © 2004 Elsevier B.V. All rights reserved.

[1]  Witold Pedrycz,et al.  Handbook of fuzzy computation , 1998 .

[2]  Charles Elkan,et al.  The Value of Prior Knowledge in Discovering Motifs with MEME , 1995, ISMB.

[3]  Gary D. Stormo,et al.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences , 1999, Bioinform..

[4]  Cathy H. Wu,et al.  Neural networks and genome informatics , 2000 .

[5]  Igor Zwir,et al.  AUTOMATED GENERATION OF QUALITATIVE REPRESENTATIONS OF COMPLEX OBJECTS BY HYBRID SOFT-COMPUTING METHODS , 2001 .

[6]  Lotfi A. Zadeh,et al.  Please Scroll down for Article International Journal of General Systems Fuzzy Sets and Systems* Fuzzy Sets and Systems* , 2022 .

[7]  G. Zhou,et al.  Neural network optimization for E. coli promoter prediction. , 1991, Nucleic acids research.

[8]  R. K. Ursem Multi-objective Optimization using Evolutionary Algorithms , 2009 .

[9]  Rafael Martí,et al.  Scatter Search: Diseño Básico y Estrategias avanzadas , 2002, Inteligencia Artif..

[10]  Michio Sugeno,et al.  A fuzzy-logic-based approach to qualitative modeling , 1993, IEEE Trans. Fuzzy Syst..

[11]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[12]  Pierre Baldi,et al.  Bioinformatics - the machine learning approach (2. ed.) , 2000 .

[13]  J. Collado-Vides,et al.  Control site location and transcriptional regulation in Escherichia coli , 1991, Microbiological reviews.

[14]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[15]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[16]  I Zwir,et al.  Automated Biological Sequence Description by Genetic Multiobjective Generalized Clustering , 2002, Annals of the New York Academy of Sciences.

[17]  Julio Collado-Vides,et al.  RegulonDB (version 3.2): transcriptional regulation and operon organization in Escherichia coli K-12 , 2001, Nucleic Acids Res..

[18]  Julio Collado-Vides,et al.  Sigma70 promoters in Escherichia coli: specific transcription in dense regions of overlapping promoter-like signals. , 2003, Journal of molecular biology.

[19]  George J. Klir,et al.  Fuzzy sets, uncertainty and information , 1988 .

[20]  Martin Reczko,et al.  Multistate Time-Delay Neural Networks for the recognition of POL II promoter sequences , 1996 .

[21]  Anders Gorm Pedersen,et al.  Investigations of Escherichia coli Promoter Sequences with Artificial Neural Networks: New Signals Discovered Upstream of the Transcriptional Startpoint , 1995, ISMB.

[22]  Gary B. Lamont,et al.  Evolutionary Algorithms for Solving Multi-Objective Problems , 2002, Genetic Algorithms and Evolutionary Computation.

[23]  Francisco Herrera,et al.  A hierarchical knowledge-based environment for linguistic modeling: models and iterative methodology , 2003, Fuzzy Sets Syst..

[24]  C. Harley,et al.  Analysis of E. coli promoter sequences. , 1987, Nucleic acids research.

[25]  Ko-Hsin Liang,et al.  A new multiobjective evolutionary algorithm , 2002, Eur. J. Oper. Res..

[26]  H. Margalit,et al.  Compilation of E. coli mRNA promoter sequences. , 1993, Nucleic acids research.

[27]  A. Ishihama,et al.  Protein-protein communication within the transcription apparatus , 1993, Journal of bacteriology.

[28]  Lothar Thiele,et al.  An evolutionary algorithm for multiobjective optimization: the strength Pareto approach , 1998 .

[29]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[30]  D. K. Hawley,et al.  Compilation and analysis of Escherichia coli promoter DNA sequences. , 1983, Nucleic acids research.

[31]  J. Thompson,et al.  The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. , 1997, Nucleic acids research.

[32]  James M. Bower,et al.  Computational modeling of genetic and biochemical networks , 2001 .

[33]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[34]  Geoffrey E. Hinton,et al.  A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.

[35]  Scott R. Presnell,et al.  Artificial neural networks for pattern recognition in biochemical sequences. , 1993, Annual review of biophysics and biomolecular structure.

[36]  S. Brenner,et al.  Genomics. The end of the beginning. , 2000, Science.

[37]  M. Ptashne,et al.  Genes and Signals , 2001 .

[38]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[39]  T. Latifi,et al.  Signal-dependent Requirement for the Co-activator Protein RcsA in Transcription of the RcsB-regulated ugd Gene* , 2003, Journal of Biological Chemistry.

[40]  M J Sternberg,et al.  Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks. , 1992, Biochemistry.

[41]  Martin G. Reese,et al.  Application of a Time-delay Neural Network to Promoter Annotation in the Drosophila Melanogaster Genome , 2001, Comput. Chem..