Prediction of transcriptional regulatory sites in the complete genome sequence of Escherichia coli K-12

MOTIVATION As one of the best-characterized free-living organisms, Escherichia coli and its recently completed genomic sequence offer a special opportunity to exploit systematically the variety of regulatory data available in the literature in order to make a comprehensive set of regulatory predictions in the whole genome. RESULTS The complete genome sequence of E.coli was analyzed for the binding of transcriptional regulators upstream of coding sequences. The biological information contained in RegulonDB (Huerta, A.M. et al., Nucleic Acids Res.,26,55-60, 1998) for 56 different transcriptional proteins was the support to implement a stringent strategy combining string search and weight matrices. We estimate that our search included representatives of 15-25% of the total number of regulatory binding proteins in E.coli. This search was performed on the set of 4288 putative regulatory regions, each 450 bp long. Within the regions with predicted sites, 89% are regulated by one protein and 81% involve only one site. These numbers are reasonably consistent with the distribution of experimental regulatory sites. Regulatory sites are found in 603 regions corresponding to 16% of operon regions and 10% of intra-operonic regions. Additional evidence gives stronger support to some of these predictions, including the position of the site, biological consistency with the function of the downstream gene, as well as genetic evidence for the regulatory interaction. The predictions described here were incorporated into the map presented in the paper describing the complete E.coli genome (Blattner,F.R. et al., Science, 277, 1453-1461, 1997). AVAILABILITY The complete set of predictions in GenBank format is available at the url: http://www. cifn.unam.mx/Computational_Biology/E.coli-predictions CONTACT ecoli-reg@cifn.unam.mx, collado@cifn.unam.mx

[1]  Gary D. Stormo,et al.  Identification of consensus patterns in unaligned DNA sequences known to be functionally related , 1990, Comput. Appl. Biosci..

[2]  Denis Thieffry,et al.  Syntactic recognition of regulatory regions in Escherichia coli , 1996, Comput. Appl. Biosci..

[3]  M. O'Neill,et al.  Consensus methods for finding and ranking DNA binding sites. Application to Escherichia coli promoters. , 1989, Journal of molecular biology.

[4]  G. Stormo,et al.  Identification of consensus patterns in unaligned dna and protein sequences: a large-deviation stati , 1995 .

[5]  J. Collado-Vides,et al.  Grammatical model of the regulation of gene expression. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[6]  D Thieffry,et al.  Definite-clause grammars for the analysis of cis-regulatory regions in E. coli. , 1997, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[7]  T. D. Schneider,et al.  Information content of binding sites on nucleotide sequences. , 1986, Journal of molecular biology.

[8]  B. Magasanik,et al.  Global regulation of gene expression. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[9]  B. Dujon The yeast genome project: what did we learn? , 1996, Trends in genetics : TIG.

[10]  Peter D. Karp,et al.  Eco Cyc: encyclopedia of Escherichia coli genes and metabolism , 1999, Nucleic Acids Res..

[11]  W. McClure,et al.  Searching for and predicting the activity of sites for DNA binding proteins: compilation and analysis of the binding sites for Escherichia coli integration host factor (IHF). , 1990, Nucleic acids research.

[12]  R Staden Computer methods to locate signals in nucleic acid sequences , 1984, Nucleic Acids Res..

[13]  Denis Thieffry,et al.  RegulonDB: a database on transcriptional regulation in Escherichia coli , 1998, Nucleic Acids Res..

[14]  M. Riley,et al.  Functions of the gene products of Escherichia coli , 1993, Microbiological reviews.

[15]  F. Blattner,et al.  Global regulation of gene expression in Escherichia coli , 1993, Journal of bacteriology.

[16]  Denis Thieffry,et al.  RegulonDB: A database on transcriptional regulation , 1998 .

[17]  F. Neidhardt,et al.  Gene‐Protein database of Escherichia coli K ‐ 12: Edition 3 , 1990, Electrophoresis.

[18]  N. W. Davis,et al.  The complete genome sequence of Escherichia coli K-12. , 1997, Science.

[19]  M A Savageau,et al.  Design of molecular control mechanisms and the demand for gene expression. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[20]  T. Werner,et al.  MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. , 1995, Nucleic acids research.

[21]  Ron D. Appel,et al.  The SWISS-2DPAGE database of two-dimensional polyacrylamide gel electrophoresis, its status in 1995 , 1996, Nucleic Acids Res..