Systematic Classification and Analysis of Themes in Protein-DNA Recognition

Protein-DNA recognition plays a central role in the regulation of gene expression. With the rapidly increasing number of protein-DNA complex structures available at atomic resolution in recent years, a systematic, complete, and intuitive framework to clarify the intrinsic relationship between the global binding modes of these complexes is needed. In this work, we modified, extended, and applied previously defined RNA-recognition themes to describe protein-DNA recognition and used a protocol that incorporates automatic methods into manual inspection to plant a comprehensive classification tree for currently available high-quality protein-DNA structures. Further, a nonredundant (representative) data set consisting of 200 thematically diverse complexes was extracted from the leaves of the classification tree by using a locally sensitive interface comparison algorithm. On the basis of the representative data set, various physical and chemical properties associated with protein-DNA interactions were analyzed using empirical or semiempirical methods. We also examined the individual energetic components involved in protein-DNA interactions and highlighted the importance of conformational entropy, which has been almost completely ignored in previous studies of protein-DNA binding energy.