Prediction of RNA-Binding Residues in Protein Sequences Using Support Vector Machines

Understanding the molecular recognition between RNA and proteins is central to elucidation of many biological processes in the cell. Although structural data are available for some protein-RNA complexes, the interaction patterns are still mostly unclear. In this study, support vector machines as well as artificial neural networks have been trained to predict RNA binding residues from five sequence-derived features, including the solvent accessible surface area, BLAST-based conservation score, hydrophobicity index, side chain pKa value and molecular mass of an amino acid. It is found that support vector machines outperform neural networks for prediction of RNA-binding residues. The best support vector machine achieves 70.74% of prediction strength (average of sensitivity and specificity), whereas the performance measure reaches 67.79% for the neural networks. The results suggest that RNA binding residues can be predicted directly from amino acid sequence information. Online prediction of RNA-binding residues is available at http://bioinformatics.ksu.edu/bindn/

[1]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[2]  A. Lehninger Principles of Biochemistry , 1984 .

[3]  J A Swets,et al.  Measuring the accuracy of diagnostic systems. , 1988, Science.

[4]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[5]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[6]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[7]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[8]  D. Draper Themes in RNA-protein recognition. , 1999, Journal of molecular biology.

[9]  S. Jones,et al.  Protein-RNA interactions: a structural analysis. , 2001, Nucleic acids research.

[10]  K. McKnight,et al.  RNA as a Target for Developing Antivirals , 2003, Antiviral chemistry & chemotherapy.

[11]  Bernhard Schölkopf,et al.  Support Vector Machine Applications in Computational Biology , 2004 .

[12]  Shandar Ahmad,et al.  Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information , 2004, Bioinform..

[13]  Shandar Ahmad,et al.  PSSM-based prediction of DNA binding sites in proteins , 2005, BMC Bioinformatics.

[14]  Yu Zong Chen,et al.  Prediction of RNA-binding proteins from primary sequence by a support vector machine approach. , 2004, RNA.