Evolution of transcription factor DNA binding sites.

In bioinformatics, binding of transcription regulatory factors to the cognate binding sites is usually described by sequence-specific binding energy, which is estimated from a training sample of sites. This model implies that all binding sites with binding energy above some threshold are functional and site sequence variations should be considered neutral until they do not reduce this energy below the threshold. To quantify this energy, the binding profile (positional weight matrix, PWM) model or consensus-based model is usually applied. Here we show that in many cases available data are not sufficient to construct a relevant PWM, and modified consensus-based model could be more effective to describe binding properties. Further, using the data about binding sites of several transcription factors, we demonstrate that some non-consensus nucleotides in "orthologous sites" (that is, binding sites of the same factor upstream of orthologous genes), which have been believed to be irrelevant or even hindering the regulation, are evolutionary very stable and specific for the regulated gene. For each two considered genomes, the number of substitutions between non-consensus nucleotides is far less than the expected number of neutral substitutions. Moreover, in several positions of binding sites regulating different genes, there are non-consensus nucleotides conserved in distant genomes. It means that there exists a selection pressure, which results in the stability of non-consensus nucleotides.

[1]  Computer Analysis of Regulatory Signals in Bacterial Genomes. Fnr Binding Sites , 2001, Molecular Biology.

[2]  Andrey A Mironov,et al.  Comparative genomics of bacterial zinc regulons: Enhanced ion transport, pathogenesis, and rearrangement of ribosomal proteins , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[3]  T. D. Schneider,et al.  Information content of binding sites on nucleotide sequences. , 1986, Journal of molecular biology.

[4]  D. S. Fields,et al.  Specificity, free energy and information content in protein-DNA interactions. , 1998, Trends in biochemical sciences.

[5]  P. V. von Hippel,et al.  Selection of DNA binding sites by regulatory proteins. , 1988, Trends in biochemical sciences.

[6]  M. Gelfand,et al.  Purine Regulon of Gamma-Proteobacteria: A Detailed Description , 2002, Russian Journal of Genetics.

[7]  Luca Peliti,et al.  Quasispecies evolution in general mean-field landscapes , 2001, cond-mat/0105379.

[8]  Dmitry A Rodionov,et al.  Conservation of the biotin regulon and the BirA regulatory signal in Eubacteria and Archaea. , 2002, Genome research.

[9]  M. Gelfand,et al.  Heat Shock (σ32 and HrcA/CIRCE) Regulons in β-, γ- and ε-Proteobacteria , 2004, Journal of Molecular Microbiology and Biotechnology.

[10]  Michael Y. Galperin,et al.  The COG database: new developments in phylogenetic classification of proteins from complete genomes , 2001, Nucleic Acids Res..

[11]  V. Barnett,et al.  Applied Linear Statistical Models , 1975 .

[12]  U. Alon,et al.  Just-in-time transcription program in metabolic pathways , 2004, Nature Genetics.

[13]  A A Mironov,et al.  Transcriptional regulation of transport and utilization systems for hexuronides, hexuronates and hexonates in gamma purple bacteria , 2000, Molecular microbiology.

[14]  M S Gelfand,et al.  Recognition of regulatory sites by genomic comparison. , 1999, Research in microbiology.

[15]  M S Gelfand,et al.  Transcriptional regulation of pentose utilisation systems in the Bacillus/Clostridium group of bacteria. , 2001, FEMS microbiology letters.

[16]  Mikhail S. Gelfand,et al.  Genome-Wide Molecular Clock and Horizontal Gene Transfer in Bacterial Evolution , 2004, Journal of bacteriology.

[17]  P. V. von Hippel,et al.  Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. , 1987, Journal of molecular biology.

[18]  N. Grishin,et al.  Genome trees and the tree of life. , 2002, Trends in genetics : TIG.

[19]  M. Gelfand,et al.  Heat shock (sigma32 and HrcA/CIRCE) regulons in beta-, gamma- and epsilon-proteobacteria. , 2003, Journal of molecular microbiology and biotechnology.

[20]  P. Bickel,et al.  Detecting DNA regulatory motifs by incorporating positional trends in information content , 2004, Genome Biology.

[21]  W. H. Day,et al.  Threshold consensus methods for molecular sequences. , 1992, Journal of theoretical biology.