Target Prediction of Transcription Factors: Refinement of Structure-Based Method

Gene regulation in higher organisms is achieved by a complex system of transcription factors. Complete genome sequences of many organisms have opened up the possibility of systematic analysis of gene regulation at the genome level. Transcription factors usually bind to multiple target sequences and regulate multiple genes in a complex manner. Because there are a large number of transcription factors and their targets, it will be difficult to analyze such system by experiment alone. Bioinformatics should play an important role in elucidating the mechanism of gene regulation. In particular, finding target genes for transcription factors at the genome level will lay a basis for the analysis of gene regulatory network. We have been developing methods for predicting target sequences of transcription factors. Structure-based method, which utilizes structural data of protein-DNA complexes, is one of the promising methods for the target prediction as reported earlier. Here we describe the refinement of the structure-based methods by inclusion of in-direct readout as well as direct readout mechanisms in protein-DNA recognition. We show that both the direct and in-direct readout mechanisms contribute to the specificity of protein-DNA recognition significantly, and that the combination of the two mechanisms will increase the accuracy of target prediction.