Predicting the binding preference of transcription factors to individual DNA k-mers

Motivation: Recognition of specific DNA sequences is a central mechanism by which transcription factors (TFs) control gene expression. Many TF-binding preferences, however, are unknown or poorly characterized, in part due to the difficulty associated with determining their specificity experimentally, and an incomplete understanding of the mechanisms governing sequence specificity. New techniques that estimate the affinity of TFs to all possible k-mers provide a new opportunity to study DNA–protein interaction mechanisms, and may facilitate inference of binding preferences for members of a given TF family when such information is available for other family members. Results: We employed a new dataset consisting of the relative preferences of mouse homeodomains for all eight-base DNA sequences in order to ask how well we can predict the binding profiles of homeodomains when only their protein sequences are given. We evaluated a panel of standard statistical inference techniques, as well as variations of the protein features considered. Nearest neighbour among functionally important residues emerged among the most effective methods. Our results underscore the complexity of TF–DNA recognition, and suggest a rational approach for future analyses of TF families. Contact: t.hughes@utorotno.ca Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Roger Brent,et al.  DNA specificity of the bicoid activator protein is determined by homeodomain recognition helix residue 9 , 1989, Cell.

[2]  Yixue Li,et al.  An approach to predict transcription factor DNA binding site specificity based upon gene and transcription factor functional categorization , 2007, Bioinform..

[3]  Xiaoyu Chen,et al.  RankMotif++: a motif-search algorithm that accounts for relative ranks of K-mers in binding transcription factors , 2007, ISMB/ECCB.

[4]  Robert V. Brill,et al.  Applied Statistics and Probability for Engineers , 2004, Technometrics.

[5]  Maria Miller,et al.  Structural Basis for DNA Recognition by the Basic Region Leucine Zipper Transcription Factor CCAAT/Enhancer-binding Protein α* , 2003, The Journal of Biological Chemistry.

[6]  N D Clarke,et al.  Covariation of residues in the homeodomain sequence family , 1995, Protein science : a publication of the Protein Society.

[7]  Carl O. Pabo,et al.  Crystal structure of an engrailed homeodomain-DNA complex at 2.8 Å resolution: A framework for understanding homeodomain-DNA interactions , 1990, Cell.

[8]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[9]  G. Stormo,et al.  Additivity in protein-DNA interactions: how good an approximation is it? , 2002, Nucleic acids research.

[10]  Daniel E. Newburger,et al.  Variation in Homeodomain DNA Binding Revealed by High-Resolution Analysis of Sequence Preferences , 2008, Cell.

[11]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[12]  A. Philippakis,et al.  Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities , 2006, Nature Biotechnology.

[13]  References , 1971 .

[14]  Christopher L. Warren,et al.  Defining the sequence-recognition profile of DNA-binding molecules. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[15]  N. D. Clarke,et al.  DIP-chip: rapid and accurate determination of DNA-binding specificity. , 2005, Genome research.

[16]  Sarah E. Ades,et al.  Differential DNA-binding specificity of the engrailed homeodomain: the role of residue 50. , 1994, Biochemistry.

[17]  J. Pohlmann,et al.  Parallel Analysis: a method for determining significant principal components , 1995 .

[18]  M Suzuki,et al.  DNA recognition code of transcription factors in the helix-turn-helix, probe helix, hormone receptor, and zinc finger families. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[19]  A D Baxevanis,et al.  Molecular evolution of the homeodomain family of transcription factors. , 2001, Nucleic acids research.

[20]  A. Laughon,et al.  DNA binding specificity of homeodomains. , 1991, Biochemistry.

[21]  R. Young,et al.  Rapid analysis of the DNA-binding specificities of transcription factors with DNA microarrays , 2004, Nature Genetics.

[22]  B. Sun,et al.  The degree of variation in DNA sequence recognition among four Drosophila homeotic proteins. , 1994, The EMBO journal.

[23]  M Gerstein,et al.  DNA recognition code of transcription factors. , 1995, Protein engineering.

[24]  Papavassiliou Ag,et al.  Transcription factors: structure, function, and implication in malignant growth. , 1995, Anticancer research.

[25]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[26]  G. Tell,et al.  A molecular code dictates sequence‐specific DNA recognition by homeodomains. , 1996, The EMBO journal.

[27]  A. Papavassiliou,et al.  Transcription factors: structure, function, and implication in malignant growth. , 1995, Anticancer research.

[28]  C. Pabo,et al.  Geometric analysis and comparison of protein-DNA interfaces: why is there no simple code for recognition? , 2000, Journal of molecular biology.