An Overview of the Prediction of Protein DNA-Binding Sites

Interactions between proteins and DNA play an important role in many essential biological processes such as DNA replication, transcription, splicing, and repair. The identification of amino acid residues involved in DNA-binding sites is critical for understanding the mechanism of these biological activities. In the last decade, numerous computational approaches have been developed to predict protein DNA-binding sites based on protein sequence and/or structural information, which play an important role in complementing experimental strategies. At this time, approaches can be divided into three categories: sequence-based DNA-binding site prediction, structure-based DNA-binding site prediction, and homology modeling and threading. In this article, we review existing research on computational methods to predict protein DNA-binding sites, which includes data sets, various residue sequence/structural features, machine learning methods for comparison and selection, evaluation methods, performance comparison of different tools, and future directions in protein DNA-binding site prediction. In particular, we detail the meta-analysis of protein DNA-binding sites. We also propose specific implications that are likely to result in novel prediction methods, increased performance, or practical applications.

[1]  N. Bhardwaj,et al.  Kernel-based machine learning protocol for predicting DNA-binding proteins , 2005, Nucleic acids research.

[2]  George Karypis,et al.  An Analysis of Information Content Present in Protein-DNA Interactions , 2008, Pacific Symposium on Biocomputing.

[3]  W. S. Valdar,et al.  Scoring residue conservation , 2002, Proteins.

[4]  Kengo Kinoshita,et al.  PreDs: a server for predicting dsDNA-binding site on protein molecular surfaces , 2005, Bioinform..

[5]  Johannes Söding,et al.  Prediction of protein functional residues from sequence by probability density estimation , 2008, Bioinform..

[6]  H. Margalit,et al.  Quantitative parameters for amino acid-base interaction: implications for prediction of protein-DNA binding sites. , 1998, Nucleic acids research.

[7]  Cathy H. Wu,et al.  Prediction of catalytic residues using Support Vector Machine with selected protein sequence and structural properties , 2006, BMC Bioinformatics.

[8]  Guy Nimrod,et al.  Identification of DNA-binding proteins using structural, electrostatic and evolutionary features. , 2009, Journal of molecular biology.

[9]  Dongbin Zhao,et al.  An Overview of the De Novo Prediction of Enzyme Catalytic Residues (Supplementry file) , 2009 .

[10]  George Karypis,et al.  YASSPP: Better kernels and coding schemes lead to improvements in protein secondary structure prediction , 2006, Proteins.

[11]  Nicholas M. Luscombe,et al.  Amino acid?base interactions: a three-dimensional analysis of protein?DNA interactions at an atomic level , 2001, Nucleic Acids Res..

[12]  Ariel Linden Measuring diagnostic and predictive accuracy in disease management: an introduction to receiver operating characteristic (ROC) analysis. , 2006, Journal of evaluation in clinical practice.

[13]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[14]  Burkhard Rost,et al.  Prediction of DNA-binding residues from sequence , 2007, ISMB/ECCB.

[15]  Mark Ptashne,et al.  Regulation of transcription: from lambda to eukaryotes. , 2005, Trends in biochemical sciences.

[16]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[17]  Seungwoo Hwang,et al.  Using evolutionary and structural information to predict DNA‐binding sites on DNA‐binding proteins , 2006, Proteins.

[18]  Liangjiang Wang,et al.  Prediction of DNA-binding residues from protein sequence information using random forests , 2009, BMC Genomics.

[19]  Gajendra P. S. Raghava,et al.  Identification of DNA-binding proteins using support vector machines and evolutionary profiles , 2007, BMC Bioinformatics.

[20]  Janet M Thornton,et al.  Using electrostatic potentials to predict DNA-binding sites on DNA-binding proteins. , 2003, Nucleic acids research.

[21]  Carmay Lim,et al.  DR_bind: a web server for predicting DNA-binding residues from the protein structure based on electrostatics, evolution and geometry , 2012, Nucleic Acids Res..

[22]  H M Berman,et al.  Protein-DNA interactions: A structural analysis. , 1999, Journal of molecular biology.

[23]  Jeffrey Skolnick,et al.  Efficient prediction of nucleic acid binding function from low-resolution protein structures. , 2006, Journal of molecular biology.

[24]  Shandar Ahmad,et al.  Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information , 2004, Bioinform..

[25]  D. Frishman,et al.  Predicting experimental properties of integral membrane proteins by a naive Bayes approach , 2007, Proteins.

[26]  Tao Li,et al.  PreDNA: accurate prediction of DNA-binding sites in proteins by integrating sequence and geometric structure information , 2013, Bioinform..

[27]  M. Karplus,et al.  CHARMM: A program for macromolecular energy, minimization, and dynamics calculations , 1983 .

[28]  Andreu Alibés,et al.  Structure-based DNA-binding prediction and design. , 2010, Methods in molecular biology.

[29]  Andreas Zell,et al.  Predicting DNA-Binding Specificities of Eukaryotic Transcription Factors , 2010, PloS one.

[30]  Yen-Jen Oyang,et al.  ProteDNA: a sequence-based predictor of sequence-specific DNA-binding residues in transcription factors , 2009, Nucleic Acids Res..

[31]  Jianjun Hu,et al.  DNABind: A hybrid algorithm for structure‐based prediction of DNA‐binding residues by combining machine learning‐ and template‐based approaches , 2013, Proteins.

[32]  Yan-Da Li,et al.  Identifying splicing sites in eukaryotic RNA: support vector machine approach , 2003, Comput. Biol. Medicine.

[33]  N. Bhardwaj,et al.  Structure Based Prediction of Binding Residues on DNA-binding Proteins , 2005, 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference.

[34]  Zheng Yuan,et al.  Flexibility analysis of enzyme active sites by crystallographic temperature factors. , 2003, Protein engineering.

[35]  Gail J. Bartlett,et al.  Using a neural network and spatial clustering to predict the location of active sites in enzymes. , 2003, Journal of molecular biology.

[36]  Ling Jing,et al.  Predicting DNA- and RNA-binding proteins from sequences with kernel methods. , 2009, Journal of theoretical biology.

[37]  Jiangning Song,et al.  Improving the accuracy of predicting disulfide connectivity by feature selection , 2010, J. Comput. Chem..

[38]  Seren Soner,et al.  DNABINDPROT: fluctuation-based predictor of DNA-binding residues within a network of interacting residues , 2010, Nucleic Acids Res..

[39]  Chen Xu,et al.  Computational prediction of DNA-protein interactions: a review. , 2010, Current computer-aided drug design.

[40]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[41]  Xiao Sun,et al.  Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature , 2008, Bioinform..

[42]  Yu-Dong Cai,et al.  Predicting DNA-binding sites of proteins based on sequential and 3D structural information , 2014, Molecular Genetics and Genomics.

[43]  S. Harrison,et al.  A structural taxonomy of DNA-binding domains , 1991, Nature.

[44]  Akinori Sarai,et al.  Moment-based prediction of DNA-binding proteins. , 2004, Journal of molecular biology.

[45]  Ozlem Keskin,et al.  Protein–DNA interactions: structural, thermodynamic and clustering patterns of conserved residues in DNA-binding proteins , 2008, Nucleic acids research.

[46]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[47]  Junfeng Xia,et al.  Exploiting a Reduced Set of Weighted Average Features to Improve Prediction of DNA-Binding Residues from 3D Structures , 2011, PloS one.

[48]  Janet M. Thornton,et al.  HTHquery: a method for detecting DNA-binding proteins with a helix-turn-helix structural motif , 2005, Bioinform..

[49]  D. Baker,et al.  Protein–DNA binding specificity predictions with structural models , 2005, Nucleic acids research.

[50]  C. Orengo,et al.  One fold with many functions: the evolutionary relationships between TIM barrel families based on their sequences, structures and functions. , 2002, Journal of molecular biology.

[51]  Yong-Zi Chen,et al.  An improved prediction of catalytic residues in enzyme structures. , 2008, Protein engineering, design & selection : PEDS.

[52]  Kengo Kinoshita,et al.  Structure‐based prediction of DNA‐binding sites on proteins Using the empirical preference of electrostatic potential and the shape of molecular surfaces , 2004, Proteins.

[53]  Burkhard Rost,et al.  DSSPcont: continuous secondary structure assignments for proteins , 2003, Nucleic Acids Res..

[54]  Jeffrey Skolnick,et al.  A Threading-Based Method for the Prediction of DNA-Binding Proteins with Application to the Human Genome , 2009, PLoS Comput. Biol..

[55]  Michael Schroeder,et al.  MetaDBSite: a meta approach to improve protein DNA-binding sites prediction , 2011, BMC Systems Biology.

[56]  Kai Wang,et al.  Protein Meta-Functional Signatures from Combining Sequence, Structure, Evolution, and Amino Acid Property Information , 2008, PLoS Comput. Biol..

[57]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[58]  Harianto Tjong,et al.  DISPLAR: an accurate method for predicting DNA-binding sites on protein surfaces , 2007, Nucleic acids research.

[59]  Chen Zhang,et al.  newDNA-Prot: Prediction of DNA-binding proteins by employing support vector machine and a comprehensive sequence representation , 2014, Comput. Biol. Chem..

[60]  George Karypis,et al.  Improving homology models for protein-ligand binding sites. , 2008, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[61]  David T. Jones,et al.  Improving the accuracy of transmembrane protein topology prediction using evolutionary information , 2007, Bioinform..

[62]  Tin Kam Ho,et al.  A Data Complexity Analysis of Comparative Advantages of Decision Forest Constructors , 2002, Pattern Analysis & Applications.

[63]  H. Kono,et al.  Structure‐based prediction of DNA target sites by regulatory proteins , 1999, Proteins.

[64]  Matthew Slattery,et al.  Absence of a simple code: how transcription factors read the genome. , 2014, Trends in biochemical sciences.

[65]  R. Sanchez,et al.  Improving accuracy and efficiency of blind protein‐ligand docking by focusing on predicted binding sites , 2009, Proteins.

[66]  Jeffrey Skolnick,et al.  DBD-Hunter: a knowledge-based method for the prediction of DNA–protein interactions , 2008, Nucleic acids research.

[67]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[68]  Vasant Honavar,et al.  Predicting DNA-binding sites of proteins from amino acid sequence , 2006, BMC Bioinformatics.

[69]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[70]  Janet M Thornton,et al.  Identifying DNA-binding proteins using structural motifs and the electrostatic potential. , 2004, Nucleic acids research.

[71]  Igor B. Kuznetsov,et al.  DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins , 2007, Bioinform..

[72]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[73]  Bo Jiang,et al.  Sequence Based Prediction of DNA-Binding Proteins Based on Hybrid Feature Selection Using Random Forest and Gaussian Naïve Bayes , 2014, PloS one.

[74]  Jun S. Liu,et al.  Extracting sequence features to predict protein–DNA interactions: a comparative study , 2008, Nucleic acids research.

[75]  Francisco Melo,et al.  The Protein-DNA Interface database , 2010, BMC Bioinformatics.

[76]  Janet M Thornton,et al.  Using structural motif templates to identify proteins with DNA binding function. , 2003, Nucleic acids research.

[77]  Yael Mandel-Gutfreund,et al.  Patch Finder Plus (PFplus): A web server for extracting and displaying positive electrostatic patches on protein surfaces , 2007, Nucleic Acids Res..

[78]  Peer Bork,et al.  SMART: identification and annotation of domains from signalling and extracellular protein sequences , 1999, Nucleic Acids Res..

[79]  Christina S. Leslie,et al.  iDBPs: a web server for the identification of DNA binding proteins , 2010, Bioinform..

[80]  V. Zhurkin,et al.  DNA sequence-dependent deformability deduced from protein-DNA crystal complexes. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[81]  Xiao Sun,et al.  Sequence-Based Prediction of DNA-Binding Residues in Proteins with Conservation and Correlation Information , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[82]  Yael Mandel-Gutfreund,et al.  Annotating nucleic acid-binding function based on protein structure. , 2003, Journal of molecular biology.

[83]  Gail J. Bartlett,et al.  Analysis of catalytic residues in enzyme active sites. , 2002, Journal of molecular biology.

[84]  D Fischer,et al.  LiveBench‐1: Continuous benchmarking of protein structure prediction servers , 2001, Protein science : a publication of the Protein Society.

[85]  Yu-Dong Cai,et al.  A novel computational method to predict transcription factor DNA binding preference. , 2006, Biochemical and biophysical research communications.

[86]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[87]  Liangjiang Wang,et al.  BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences , 2006, Nucleic Acids Res..

[88]  T. Barrette,et al.  Probabilistic model of the human protein-protein interaction network , 2005, Nature Biotechnology.

[89]  Dmitrij Frishman,et al.  PEDANT covers all complete RefSeq genomes , 2008, Nucleic Acids Res..

[90]  Shandar Ahmad,et al.  PSSM-based prediction of DNA binding sites in proteins , 2005, BMC Bioinformatics.

[91]  Yixue Li,et al.  An approach to predict transcription factor DNA binding site specificity based upon gene and transcription factor functional categorization , 2007, Bioinform..

[92]  J. Thornton,et al.  An overview of the structures of protein-DNA complexes , 2000, Genome Biology.

[93]  Lin Lu,et al.  A novel computational approach to predict transcription factor DNA binding preference. , 2009, Journal of proteome research.

[94]  M. Schroeder,et al.  Using protein binding site prediction to improve protein docking. , 2008, Gene.

[95]  Bruno Contreras-Moreira,et al.  3D-footprint: a database for the structural analysis of protein–DNA complexes , 2009, Nucleic Acids Res..

[96]  Guoli Wang,et al.  PISCES: recent improvements to a PDB sequence culling server , 2005, Nucleic Acids Res..

[97]  Hui Lu,et al.  NAPS: a residue-level nucleic acid-binding prediction server , 2010, Nucleic Acids Res..

[98]  Lin Yang,et al.  TFBSshape: a motif database for DNA shape features of transcription factor binding sites , 2013, Nucleic Acids Res..

[99]  Yao Lu,et al.  Computational methods for DNA-binding protein and binding residue prediction. , 2013, Protein and peptide letters.

[100]  Jiansheng Wu,et al.  Identification of DNA-Binding Proteins Using Support Vector Machine with Sequence Information , 2013, Comput. Math. Methods Medicine.