Prediction of protein-RNA binding sites by a random forest method with combined features

MOTIVATION Protein-RNA interactions play a key role in a number of biological processes, such as protein synthesis, mRNA processing, mRNA assembly, ribosome function and eukaryotic spliceosomes. As a result, a reliable identification of RNA binding site of a protein is important for functional annotation and site-directed mutagenesis. Accumulated data of experimental protein-RNA interactions reveal that a RNA binding residue with different neighbor amino acids often exhibits different preferences for its RNA partners, which in turn can be assessed by the interacting interdependence of the amino acid fragment and RNA nucleotide. RESULTS In this work, we propose a novel classification method to identify the RNA binding sites in proteins by combining a new interacting feature (interaction propensity) with other sequence- and structure-based features. Specifically, the interaction propensity represents a binding specificity of a protein residue to the interacting RNA nucleotide by considering its two-side neighborhood in a protein residue triplet. The sequence as well as the structure-based features of the residues are combined together to discriminate the interaction propensity of amino acids with RNA. We predict RNA interacting residues in proteins by implementing a well-built random forest classifier. The experiments show that our method is able to detect the annotated protein-RNA interaction sites in a high accuracy. Our method achieves an accuracy of 84.5%, F-measure of 0.85 and AUC of 0.92 prediction of the RNA binding residues for a dataset containing 205 non-homologous RNA binding proteins, and also outperforms several existing RNA binding residue predictors, such as RNABindR, BindN, RNAProB and PPRint, and some alternative machine learning methods, such as support vector machine, naive Bayes and neural network in the comparison study. Furthermore, we provide some biological insights into the roles of sequences and structures in protein-RNA interactions by both evaluating the importance of features for their contributions in predictive accuracy and analyzing the binding patterns of interacting residues. AVAILABILITY All the source data and code are available at http://www.aporc.org/doc/wiki/PRNA or http://www.sysbio.ac.cn/datatools.asp CONTACT lnchen@sibs.ac.cn SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  R. Graham,et al.  Predicting RNA-binding sites from the protein structure based on electrostatics, evolution and geometry , 2008, Nucleic acids research.

[2]  K. Hall,et al.  RNA-protein interactions. , 2002, Current opinion in structural biology.

[3]  Wen-Lian Hsu,et al.  Predicting RNA-binding sites of proteins using support vector machines and evolutionary information , 2008, BMC Bioinformatics.

[4]  Fan Jiang,et al.  Prediction of protein-protein binding site by using core interface residue and support vector machine , 2008, BMC Bioinformatics.

[5]  Ruth Nussinov,et al.  Prediction of interacting single-stranded RNA bases by protein-binding patterns. , 2008, Journal of molecular biology.

[6]  N. Go,et al.  Amino acid residue doublet propensity in the protein–RNA interface and its application to RNA interface prediction , 2006, Nucleic acids research.

[7]  Jae-Hyung Lee,et al.  RNABindR: a server for analyzing and predicting RNA-binding sites in proteins , 2007, Nucleic Acids Res..

[8]  John L Markley,et al.  Nearest-neighbor effects on backbone alpha and beta carbon chemical shifts in proteins , 2007, Journal of biomolecular NMR.

[9]  Liangjiang Wang,et al.  BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences , 2006, Nucleic Acids Res..

[10]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[11]  Gajendra P.S. Raghava,et al.  Prediction of RNA binding sites in a protein using SVM and PSSM profile , 2008, Proteins.

[12]  T. Hwa,et al.  Identification of direct residue contacts in protein–protein interaction by message passing , 2009, Proceedings of the National Academy of Sciences.

[13]  Kyungsook Han,et al.  Computational analysis of hydrogen bonds in protein–RNA complexes for interaction patterns , 2003, FEBS letters.

[14]  Xiang-Sun Zhang,et al.  Bridging protein local structures and protein functions , 2008, Amino Acids.

[15]  T. Glisovic,et al.  RNA‐binding proteins and post‐transcriptional gene regulation , 2008, FEBS letters.

[16]  N. Morozova,et al.  Protein-RNA interactions: exploring binding patterns with a three-dimensional superposition analysis of high resolution structures , 2006, Bioinform..

[17]  Satoru Miyano,et al.  A neural network method for identification of RNA-interacting residues in protein. , 2004, Genome informatics. International Conference on Genome Informatics.

[18]  A. R. Srinivasan,et al.  The nucleic acid database. A comprehensive relational database of three-dimensional structures of nucleic acids. , 1992, Biophysical journal.

[19]  A. Lehninger Principles of Biochemistry , 1984 .

[20]  Jennifer A. Doudna,et al.  A universal mode of helix packing in RNA , 2001, Nature Structural Biology.

[21]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[22]  Jonathan J. Ellis,et al.  Protein–RNA interactions: Structural analysis and functional classes , 2006, Proteins.

[23]  Haruki Nakamura,et al.  Protein function annotation from sequence: prediction of residues interacting with RNA , 2009, Bioinform..

[24]  Y. Shamoo,et al.  Structure-based analysis of protein-RNA interactions using the program ENTANGLE. , 2001, Journal of molecular biology.

[25]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[26]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[27]  S. Jones,et al.  Protein-RNA interactions: a structural analysis. , 2001, Nucleic acids research.

[28]  B. Rost,et al.  Conservation and prediction of solvent accessibility in protein families , 1994, Proteins.

[29]  Gabriele Varani,et al.  RNA is rarely at a loss for companions; as soon as RNA , 2008 .

[30]  Zukang Feng,et al.  The Nucleic Acid Database. , 2002, Acta crystallographica. Section D, Biological crystallography.

[31]  Vasant G Honavar,et al.  Prediction of RNA binding sites in proteins from amino acid sequence. , 2006, RNA.

[32]  D. Eisenberg,et al.  Correlation of sequence hydrophobicities measures similarity in three-dimensional protein structure. , 1983, Journal of molecular biology.

[33]  C. Bladon,et al.  5 Amino acids, peptides and proteins , 1993 .