The Sitega Tool for Recognition and Context Analysis of Transcription Factor Binding Sites: Significant Dinucleotide Features Besides the Canonical Consensus Exemplified By SF-1 Binding Site

Development of computational methods to search for transcription factor binding sites (TFBSs) is important in investigation of regulatory regions of eukaryotic genes and in genome annotation. We propose a SiteGA method for recognition of TFBSs, providing the search of SF-1 binding sites as an example. The SiteGA method was implemented using a genetic algorithm (GA) involving an iterative discriminant analysis of local dinucleotide context characteristics. These characteristics were compiled not only over the core binding site (BS) region, but over its flanks as well. The major advancement of this approach is an improvement in accuracy by a large window capturing the meaningful context features besides the canonical consensus. The experimental verification confirmed the majority of predicted sites. The program SiteGA is available at http://wwwmgs2.bionet.nsc.ru/mgs/programs/sitega/.