ProteDNA: a sequence-based predictor of sequence-specific DNA-binding residues in transcription factors

This article presents the design of a sequence-based predictor named ProteDNA for identifying the sequence-specific binding residues in a transcription factor (TF). Concerning protein–DNA interactions, there are two types of binding mechanisms involved, namely sequence-specific binding and nonspecific binding. Sequence-specific bindings occur between protein sidechains and nucleotide bases and correspond to sequence-specific recognition of genes. Therefore, sequence-specific bindings are essential for correct gene regulation. In this respect, ProteDNA is distinctive since it has been designed to identify sequence-specific binding residues. In order to accommodate users with different application needs, ProteDNA has been designed to operate under two modes, namely, the high-precision mode and the balanced mode. According to the experiments reported in this article, under the high-precision mode, ProteDNA has been able to deliver precision of 82.3%, specificity of 99.3%, sensitivity of 49.8% and accuracy of 96.5%. Meanwhile, under the balanced mode, ProteDNA has been able to deliver precision of 60.8%, specificity of 97.6%, sensitivity of 60.7% and accuracy of 95.4%. ProteDNA is available at the following websites: http://protedna.csbb.ntu.edu.tw/ http://protedna.csie.ntu.edu.tw/ http://bio222.esoe.ntu.edu.tw/ProteDNA/.

[1]  Shandar Ahmad,et al.  PSSM-based prediction of DNA binding sites in proteins , 2005, BMC Bioinformatics.

[2]  J. Thornton,et al.  An overview of the structures of protein-DNA complexes , 2000, Genome Biology.

[3]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[4]  Vasant Honavar,et al.  Predicting DNA-binding sites of proteins from amino acid sequence , 2006, BMC Bioinformatics.

[5]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[7]  Ralf Zimmer,et al.  SSEP-Domain: protein domain prediction by alignment of secondary structure elements and profiles , 2006, Bioinform..

[8]  Kengo Kinoshita,et al.  Structure‐based prediction of DNA‐binding sites on proteins Using the empirical preference of electrostatic potential and the shape of molecular surfaces , 2004, Proteins.

[9]  Christopher J. Oldfield,et al.  Intrinsic disorder in transcription factors. , 2006, Biochemistry.

[10]  Janet M. Thornton,et al.  HTHquery: a method for detecting DNA-binding proteins with a helix-turn-helix structural motif , 2005, Bioinform..

[11]  Harianto Tjong,et al.  DISPLAR: an accurate method for predicting DNA-binding sites on protein surfaces , 2007, Nucleic acids research.

[12]  Liangjiang Wang,et al.  BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences , 2006, Nucleic Acids Res..

[13]  Igor B. Kuznetsov,et al.  DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins , 2007, Bioinform..

[14]  Burkhard Rost,et al.  Prediction of DNA-binding residues from sequence , 2007, ISMB/ECCB.

[15]  Janet M Thornton,et al.  Using electrostatic potentials to predict DNA-binding sites on DNA-binding proteins. , 2003, Nucleic acids research.

[16]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[17]  Rodney F. Boyer Concepts in Biochemistry , 2001 .

[18]  Ralf Zimmer,et al.  Improving Profile-Profile Alignments via Log Average Scoring , 2001, WABI.

[19]  S. Henikoff,et al.  Amino acid substitution matrices. , 2000, Advances in protein chemistry.