CONTEXT FEATURES OF TRANSCRIPTION FACTOR BINDING SITE SEQUENCES: RELATION TO DNA-BINDING DOMAIN CLASSIFICATION

Summary Motivation: Classification of eukaryotic transcription factor binding sites (TFBS) by context features of DNA sequences is of importance for analysis gene transcription regulation. Growth of information volume for gene regulatory sequences makes it possible to reveal new statistical regularities governing DNA-binding and gene expression regulation. Results: We search for sequence constraints connected with text complexity for core regions of transcription factor binding sites. The content of protein-binding nucleotide sequences is connected with DNA-binding domain classification. These finding suggest new approaches for TFBS classification and clusterization. Availability: the software available at http://wwwmgs.bionet.nsc.ru/mgs/programs/ low_complexity, database at http://wwwmgs.bionet.nsc.ru/mgs/gnw/trrd/, results and supplementary materials are available by request to the authors.