PredRBR: Accurate Prediction of RNA-Binding Residues in proteins using Gradient Tree Boosting

Prediction of Protein-RNA binding sites is one of the most challenging and intriguing problems in the field of computational biology. Here, we proposed an effectively machine learning algorithm termed PredRBR (Prediction of RNA Binding Residues), using Gradient Tree Boosting algorithm and mRMR-IFS feature selection method in combination with sequence features, structure characteristics and two categories of structural neighborhood feature for prediction of RNA binding sites in proteins. We evaluate PredRBR on the independent test dataset (RBP101), and obtain significant improvement on the prediction performance compared with other state-of-the-art approaches. In addition, we test the variable importance of diverse feature types. The results show that structural neighborhood features play a crucial role in the identification of RNA binding sites.

[1]  Minoru Kanehisa,et al.  AAindex: Amino Acid index database , 2000, Nucleic Acids Res..

[2]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[3]  L. Ratner,et al.  Myristoylation-dependent replication and assembly of human immunodeficiency virus 1. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Gajendra P.S. Raghava,et al.  Prediction of RNA binding sites in a protein using SVM and PSSM profile , 2008, Proteins.

[5]  Doheon Lee,et al.  A feature-based approach to modeling protein–protein interaction hot spots , 2009, Nucleic acids research.

[6]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[7]  S. Jones,et al.  Protein-RNA interactions: a structural analysis. , 2001, Nucleic acids research.

[8]  Angela Re,et al.  RNA-protein interactions: an overview. , 2014, Methods in molecular biology.

[9]  Vasant Honavar,et al.  PRIDB: a protein–RNA interface database , 2010, Nucleic Acids Res..

[10]  David E. Goldberg,et al.  Genetic algorithms and Machine Learning , 1988, Machine Learning.

[11]  Chenhsiung Chan,et al.  Relationship between local structural entropy and protein thermostabilty , 2004, Proteins.

[12]  Rasna R. Walia,et al.  RNABindRPlus: A Predictor that Combines Machine Learning and Sequence Homology-Based Methods to Improve the Reliability of Predicted RNA-Binding Residues in Proteins , 2014, PloS one.

[13]  Liangjiang Wang,et al.  BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences , 2006, Nucleic Acids Res..

[14]  Eric Westhof,et al.  A Large-Scale Assessment of Nucleic Acids Binding Site Prediction Programs , 2015, PLoS Comput. Biol..

[15]  O. Ptitsyn,et al.  Empirical solvent‐mediated potentials hold for both intra‐molecular and inter‐molecular inter‐residue interactions , 1998, Protein science : a publication of the Protein Society.

[16]  D. Söll,et al.  Aminoacyl-tRNA synthetases: general features and recognition of transfer RNAs. , 1979, Annual review of biochemistry.

[17]  Nick V Grishin,et al.  Effective scoring function for protein sequence design , 2003, Proteins.

[18]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[19]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).

[20]  Satoru Miyano,et al.  A neural network method for identification of RNA-interacting residues in protein. , 2004, Genome informatics. International Conference on Genome Informatics.

[21]  J. Friedman Stochastic gradient boosting , 2002 .

[22]  Zhichao Miao,et al.  Prediction of nucleic acid binding probability in proteins: a neighboring residue network based score , 2015, Nucleic acids research.

[23]  Haruki Nakamura,et al.  Protein function annotation from sequence: prediction of residues interacting with RNA , 2009, Bioinform..

[24]  Zhi-Ping Liu,et al.  Prediction of protein-RNA binding sites by a random forest method with combined features , 2010, Bioinform..

[25]  Rong Liu,et al.  SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues , 2015, PloS one.

[26]  Vasant Honavar,et al.  Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art , 2012, BMC Bioinformatics.

[27]  W. S. Valdar,et al.  Scoring residue conservation , 2002, Proteins.

[28]  Salam A. Assi,et al.  PCRPi: Presaging Critical Residues in Protein interfaces, a new computational tool to chart hot spots in protein interfaces , 2009, Nucleic acids research.

[29]  P. Radivojac,et al.  PROTEINS: Structure, Function, and Bioinformatics Suppl 7:176–182 (2005) Exploiting Heterogeneous Sequence Properties Improves Prediction of Protein Disorder , 2022 .

[30]  Shuigeng Zhou,et al.  Boosting Prediction Performance of Protein-Protein Interaction Hot Spots by Using Structural Neighborhood Properties - (Extended Abstract) , 2013, RECOMB.

[31]  Bharat Panwar,et al.  Identification of protein-interacting nucleotides in a RNA sequence using composition profile of tri-nucleotides. , 2015, Genomics.

[32]  G. Varani,et al.  RNA recognition by RNP proteins during RNA processing. , 1998, Annual review of biophysics and biomolecular structure.

[33]  Vasant G Honavar,et al.  Prediction of RNA binding sites in proteins from amino acid sequence. , 2006, RNA.

[34]  Ralf Zimmer,et al.  New scoring Schemes for Protein fold recognition based on Voronoi contacts , 1997, German Conference on Bioinformatics.

[35]  Jack Y. Yang,et al.  BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features , 2010, BMC Systems Biology.

[36]  Shuigeng Zhou,et al.  Prediction of protein-protein interaction sites using an ensemble method , 2009, BMC Bioinformatics.

[37]  Wen-Lian Hsu,et al.  Predicting RNA-binding sites of proteins using support vector machines and evolutionary information , 2008, BMC Bioinformatics.

[38]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .