An accurate feature‐based method for identifying DNA‐binding residues on protein surfaces

Proteins that interact with DNA play vital roles in all mechanisms of gene expression and regulation. In order to understand these activities, it is crucial to analyze and identify DNA‐binding residues on DNA‐binding protein surfaces. Here, we proposed two novel features B‐factor and packing density in combination with several conventional features to characterize the DNA‐binding residues in a well‐constructed representative dataset of 119 protein‐DNA complexes from the Protein Data Bank (PDB). Based on the selected features, a prediction model for DNA‐binding residues was constructed using support vector machine (SVM). The predictor was evaluated using a 5‐fold cross validation on above dataset of 123 DNA‐binding proteins. Moreover, two independent datasets of 83 DNA‐bound protein structures and their corresponding DNA‐free forms were compiled. The B‐factor and packing density features were statistically analyzed on these 83 pairs of holo‐apo proteins structures. Finally, we developed the SVM model to accurately predict DNA‐binding residues on protein surface, given the DNA‐free structure of a protein. Results showed here indicate that our method represents a significant improvement of previously existing approaches such as DISPLAR. The observation suggests that our method will be useful in studying protein‐DNA interactions to guide consequent works such as site‐directed mutagenesis and protein‐DNA docking. Proteins 2011. © 2010 Wiley‐Liss, Inc.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Shandar Ahmad,et al.  PSSM-based prediction of DNA binding sites in proteins , 2005, BMC Bioinformatics.

[3]  J. Thornton,et al.  An overview of the structures of protein-DNA complexes , 2000, Genome Biology.

[4]  Andrea Zen,et al.  Using dynamics-based comparisons to predict nucleic acid binding sites in proteins: an application to OB-fold domains , 2009, Bioinform..

[5]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Zheng Yuan,et al.  Flexibility analysis of enzyme active sites by crystallographic temperature factors. , 2003, Protein engineering.

[7]  R. Raz,et al.  ProMate: a structure based prediction program to identify the location of protein-protein binding sites. , 2004, Journal of molecular biology.

[8]  N. Bhardwaj,et al.  Residue‐level prediction of DNA‐binding sites and its application on DNA‐binding protein predictions , 2007, FEBS letters.

[9]  H M Berman,et al.  Protein-DNA interactions: A structural analysis. , 1999, Journal of molecular biology.

[10]  R. Nussinov,et al.  Hot regions in protein--protein interactions: the organization and contribution of structurally conserved hot spot residues. , 2005, Journal of molecular biology.

[11]  I. Luque,et al.  Structural stability of binding sites: Consequences for binding affinity and allosteric effects , 2000, Proteins.

[12]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[13]  Liangjiang Wang,et al.  BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences , 2006, Nucleic Acids Res..

[14]  Wenchao Jiang,et al.  Identifying protein–protein interaction sites in transient complexes with temperature factor, sequence profile and accessible surface area , 2009, Amino Acids.

[15]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[16]  Janet M Thornton,et al.  Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity. , 2002, Journal of molecular biology.

[17]  Igor B. Kuznetsov,et al.  DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins , 2007, Bioinform..

[18]  J. Drenth Principles of protein x-ray crystallography , 1994 .

[19]  Doheon Lee,et al.  A feature-based approach to modeling protein–protein interaction hot spots , 2009, Nucleic acids research.

[20]  Xiao Sun,et al.  Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature , 2008, Bioinform..

[21]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[22]  Shandar Ahmad,et al.  Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information , 2004, Bioinform..

[23]  Kenji Mizuguchi,et al.  Prediction of mono- and di-nucleotide-specific DNA-binding sites in proteins using neural networks , 2009, BMC Structural Biology.

[24]  S Karlin,et al.  Measures of residue density in protein structures. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[25]  J M Thornton,et al.  Protein-protein interactions: a review of protein dimer structures. , 1995, Progress in biophysics and molecular biology.

[26]  Seren Soner,et al.  DNABINDPROT: fluctuation-based predictor of DNA-binding residues within a network of interacting residues , 2010, Nucleic Acids Res..

[27]  Liangjiang Wang,et al.  Prediction of DNA-binding residues from protein sequence information using random forests , 2009, BMC Genomics.

[28]  Ozlem Keskin,et al.  Protein–DNA interactions: structural, thermodynamic and clustering patterns of conserved residues in DNA-binding proteins , 2008, Nucleic acids research.

[29]  Harianto Tjong,et al.  DISPLAR: an accurate method for predicting DNA-binding sites on protein surfaces , 2007, Nucleic acids research.

[30]  Cyrus Chothia,et al.  The accessible surface area and stability of oligomeric proteins , 1987, Nature.

[31]  Burkhard Rost,et al.  Prediction of DNA-binding residues from sequence , 2007, ISMB/ECCB.

[32]  Seungwoo Hwang,et al.  Using evolutionary and structural information to predict DNA‐binding sites on DNA‐binding proteins , 2006, Proteins.

[33]  Christian Cole,et al.  Side‐chain conformational entropy at protein–protein interfaces , 2002, Protein science : a publication of the Protein Society.

[34]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.