On the Effects of Transcription Factor Properties on the Information Content of Binding Sites

Networks of genes which encode transcription factors (regulatory networks) play a central role in the realization of phenotypic traits based on genetic information. Sequence-specific recognition of DNA subsequences by proteins is a key mechanism in constituting regulatory networks. Understanding the information theoretic principles underlying the evolution of transcription factors and their binding sites is therefore a major challenge in bioinformatics [1]. Advances in this field are expected to provide a basis for improving algorithmic binding site identification and promoter analysis [2], and for deciphering regulatory codes. Previous studies [3] have suggested that the information content deduced from binding site sequence sets (Rsequence) approximately equals the information content deduced from relative binding site abundance (Rfrequency). Here, we investigate the relation between these two information quantities using a maximum entropy approach.

[1]  K Frech,et al.  Software for the analysis of DNA sequence elements of transcription , 1997, Comput. Appl. Biosci..

[2]  D. S. Fields,et al.  Specificity, free energy and information content in protein-DNA interactions. , 1998, Trends in biochemical sciences.

[3]  T. D. Schneider,et al.  Evolution of biological information. , 2000, Nucleic acids research.