BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences

BindN () takes an amino acid sequence as input and predicts potential DNA or RNA-binding residues with support vector machines (SVMs). Protein datasets with known DNA or RNA-binding residues were selected from the Protein Data Bank (PDB), and SVM models were constructed using data instances encoded with three sequence features, including the side chain pKa value, hydrophobicity index and molecular mass of an amino acid. The results suggest that DNA-binding residues can be predicted at 69.40% sensitivity and 70.47% specificity, while prediction of RNA-binding residues achieves 66.28% sensitivity and 69.84% specificity. When compared with previous studies, the SVM models appear to be more accurate and more efficient for online predictions. BindN provides a useful tool for understanding the function of DNA and RNA-binding proteins based on primary sequence data.

[1]  A. Lehninger Principles of Biochemistry , 1984 .

[2]  Mark Ptashne,et al.  Regulation of transcription: from lambda to eukaryotes. , 2005, Trends in biochemical sciences.

[3]  William Stafiord Noble,et al.  Support vector machine applications in computational biology , 2004 .

[4]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[5]  J A Swets,et al.  Measuring the accuracy of diagnostic systems. , 1988, Science.

[6]  Bernhard Schölkopf,et al.  Support Vector Machine Applications in Computational Biology , 2004 .

[7]  S. Jones,et al.  Protein-RNA interactions: a structural analysis. , 2001, Nucleic acids research.

[9]  Janet M Thornton,et al.  Using electrostatic potentials to predict DNA-binding sites on DNA-binding proteins. , 2003, Nucleic acids research.

[10]  Nicholas M. Luscombe,et al.  Amino acid?base interactions: a three-dimensional analysis of protein?DNA interactions at an atomic level , 2001, Nucleic Acids Res..

[11]  Shandar Ahmad,et al.  Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information , 2004, Bioinform..

[12]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[13]  K Nadassy,et al.  Structural features of protein-nucleic acid recognition sites. , 1999, Biochemistry.

[14]  Harry F Noller,et al.  RNA Structure: Reading the Ribosome , 2005, Science.

[15]  D. Draper Themes in RNA-protein recognition. , 1999, Journal of molecular biology.

[16]  Shandar Ahmad,et al.  PSSM-based prediction of DNA binding sites in proteins , 2005, BMC Bioinformatics.

[17]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[18]  Kengo Kinoshita,et al.  PreDs: a server for predicting dsDNA-binding site on protein molecular surfaces , 2005, Bioinform..

[19]  Brenton R Graveley,et al.  RS domains contact the pre-mRNA throughout spliceosome assembly. , 2005, Trends in biochemical sciences.

[20]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .