Predicting DNA recognition by Cys2His2 zinc finger proteins

MOTIVATION Cys(2)His(2) zinc finger (ZF) proteins represent the largest class of eukaryotic transcription factors. Their modular structure and well-conserved protein-DNA interface allow the development of computational approaches for predicting their DNA-binding preferences even when no binding sites are known for a particular protein. The 'canonical model' for ZF protein-DNA interaction consists of only four amino acid nucleotide contacts per zinc finger domain. RESULTS We present an approach for predicting ZF binding based on support vector machines (SVMs). While most previous computational approaches have been based solely on examples of known ZF protein-DNA interactions, ours additionally incorporates information about protein-DNA pairs known to bind weakly or not at all. Moreover, SVMs with a linear kernel can naturally incorporate constraints about the relative binding affinities of protein-DNA pairs; this type of information has not been used previously in predicting ZF protein-DNA binding. Here, we build a high-quality literature-derived experimental database of ZF-DNA binding examples and utilize it to test both linear and polynomial kernels for predicting ZF protein-DNA binding on the basis of the canonical binding model. The polynomial SVM outperforms previously published prediction procedures as well as the linear SVM. This may indicate the presence of dependencies between contacts in the canonical binding model and suggests that modification of the underlying structural model may result in further improved performance in predicting ZF protein-DNA binding. Overall, this work demonstrates that methods incorporating information about non-binding and relative binding of protein-DNA pairs have great potential for effective prediction of protein-DNA interactions. AVAILABILITY An online tool for predicting ZF DNA binding is available at http://compbio.cs.princeton.edu/zf/.

[1]  Pilar Blancafort,et al.  Development of Zinc Finger Domains for Recognition of the 5′-CNN-3′ Family DNA Sequences and Their Use in the Construction of Artificial Transcription Factors* , 2005, Journal of Biological Chemistry.

[2]  S. Iuchi,et al.  Three classes of C2H2 zinc finger proteins , 2001, Cellular and Molecular Life Sciences CMLS.

[3]  D J Segal,et al.  Insights into the molecular recognition of the 5'-GNN-3' family of DNA sequences by zinc finger domains. , 2000, Journal of molecular biology.

[4]  Henriette O'Geen,et al.  Genome-Wide Analysis of KAP1 Binding Suggests Autoregulation of KRAB-ZNFs , 2007, PLoS genetics.

[5]  Panayiotis V Benos,et al.  Probabilistic code for DNA recognition by proteins of the EGR family. , 2002, Journal of molecular biology.

[6]  Jun-tao Guo,et al.  Quantitative evaluation of protein–DNA interactions using an optimized knowledge-based potential , 2005, Nucleic acids research.

[7]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[8]  C. Pabo,et al.  Zif268 protein-DNA complex refined at 1.6 A: a model system for understanding zinc finger-DNA interactions. , 1996, Structure.

[9]  B. Honig,et al.  Structure-based prediction of C2H2 zinc-finger binding specificity: sensitivity to docking geometry , 2007, Nucleic acids research.

[10]  C. Pabo,et al.  Beyond the "recognition code": structures of two Cys2His2 zinc finger/TATA box complexes. , 2001, Structure.

[11]  Nir Friedman,et al.  Ab Initio Prediction of Transcription Factor Targets Using Structural Knowledge , 2005, PLoS Comput. Biol..

[12]  Alexander E. Kel,et al.  TRANSFAC®: transcriptional regulation, from patterns to profiles , 2003, Nucleic Acids Res..

[13]  R. Fuller,et al.  Development of Zinc Finger Domains for Recognition of the 5′-CNN-3′ Family DNA Sequences and Their Use in the Construction of Artificial Transcription Factors* , 2005, Journal of Biological Chemistry.

[14]  Mona Singh,et al.  Comparative analysis of methods for representing and searching for transcription factor binding sites , 2004, Bioinform..

[15]  R. Young,et al.  Rapid analysis of the DNA-binding specificities of transcription factors with DNA microarrays , 2004, Nature Genetics.

[16]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[17]  G. Church,et al.  Exploring the DNA-binding specificities of zinc fingers with DNA microarrays , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Ronnie J Winfrey,et al.  Rapid "open-source" engineering of customized zinc-finger nucleases for highly efficient gene modification. , 2008, Molecular cell.

[19]  G. Church,et al.  Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. , 2002, Nucleic acids research.

[20]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[21]  Nicholas M. Luscombe,et al.  Amino acid?base interactions: a three-dimensional analysis of protein?DNA interactions at an atomic level , 2001, Nucleic Acids Res..

[22]  C. Pabo,et al.  High-resolution structures of variant Zif268-DNA complexes: implications for understanding zinc finger-DNA recognition. , 1998, Structure.

[23]  Gary D. Stormo,et al.  SAMIE: Statistical Algorithm for Modeling Interaction Energies , 2000, Pacific Symposium on Biocomputing.

[24]  D. Baker,et al.  Protein–DNA binding specificity predictions with structural models , 2005, Nucleic acids research.

[25]  D J Segal,et al.  Development of Zinc Finger Domains for Recognition of the 5′-ANN-3′ Family of DNA Sequences and Their Use in the Construction of Artificial Transcription Factors* , 2001, The Journal of Biological Chemistry.

[26]  C. Pabo,et al.  Design and selection of novel Cys2His2 zinc finger proteins. , 2001, Annual review of biochemistry.

[27]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[28]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[29]  Gary D. Stormo,et al.  Context-dependent DNA recognition code for C2H2 zinc-finger transcription factors , 2008, Bioinform..

[30]  Jessica H. Fong,et al.  Predicting specificity in bZIP coiled-coil protein interactions , 2004, Genome Biology.

[31]  N. Pavletich,et al.  Zinc finger-DNA recognition: crystal structure of a Zif268-DNA complex at 2.1 A , 1991, Science.

[32]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[33]  S. Harrison,et al.  Differing roles for zinc fingers in DNA recognition: structure of a six-finger transcription factor IIIA complex. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[34]  C. Pabo,et al.  Crystal structure of a five-finger GLI-DNA complex: new perspectives on zinc fingers. , 1993, Science.

[35]  Sridhar Hannenhalli,et al.  Eukaryotic transcription factor binding sites - modeling and integrative search methods , 2008, Bioinform..

[36]  M Gerstein,et al.  DNA recognition code of transcription factors. , 1995, Protein engineering.

[37]  C. Pabo,et al.  DNA recognition by Cys2His2 zinc finger proteins. , 2000, Annual review of biophysics and biomolecular structure.

[38]  D J Segal,et al.  Toward controlling gene expression at will: selection and design of zinc finger domains recognizing each of the 5'-GNN-3' DNA target sequences. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[39]  S. Iuchi,et al.  Three classes of C 2 H 2 zinc finger proteins , 2001 .

[40]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[41]  H. Margalit,et al.  Quantitative parameters for amino acid-base interaction: implications for prediction of protein-DNA binding sites. , 1998, Nucleic acids research.

[43]  Pilar Blancafort,et al.  Scanning the human genome with combinatorial transcription factor libraries , 2003, Nature Biotechnology.

[44]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[45]  Ned S Wingreen,et al.  Weight matrices for protein-DNA binding sites from a single co-crystal structure. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.