RBRDetector: Improved prediction of binding residues on RNA‐binding protein structures using complementary feature‐ and template‐based strategies

Computational prediction of RNA‐binding residues is helpful in uncovering the mechanisms underlying protein‐RNA interactions. Traditional algorithms individually applied feature‐ or template‐based prediction strategy to recognize these crucial residues, which could restrict their predictive power. To improve RNA‐binding residue prediction, herein we propose the first integrative algorithm termed RBRDetector (RNA‐Binding Residue Detector) by combining these two strategies. We developed a feature‐based approach that is an ensemble learning predictor comprising multiple structure‐based classifiers, in which well‐defined evolutionary and structural features in conjunction with sequential or structural microenvironment were used as the inputs of support vector machines. Meanwhile, we constructed a template‐based predictor to recognize the putative RNA‐binding regions by structurally aligning the query protein to the RNA‐binding proteins with known structures. The final RBRDetector algorithm is an ingenious fusion of our feature‐ and template‐based approaches based on a piecewise function. By validating our predictors with diverse types of structural data, including bound and unbound structures, native and simulated structures, and protein structures binding to different RNA functional groups, we consistently demonstrated that RBRDetector not only had clear advantages over its component methods, but also significantly outperformed the current state‐of‐the‐art algorithms. Nevertheless, the major limitation of our algorithm is that it performed relatively well on DNA‐binding proteins and thus incorrectly predicted the DNA‐binding regions as RNA‐binding interfaces. Finally, we implemented the RBRDetector algorithm as a user‐friendly web server, which is freely accessible at http://ibi.hzau.edu.cn/rbrdetector. Proteins 2014; 82:2455–2471. © 2014 Wiley Periodicals, Inc.

[1]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[2]  Gajendra P.S. Raghava,et al.  Prediction of RNA binding sites in a protein using SVM and PSSM profile , 2008, Proteins.

[3]  Yang Zhang,et al.  I-TASSER server for protein 3D structure prediction , 2008, BMC Bioinformatics.

[4]  Shula Shazman,et al.  From face to interface recognition: a differential geometric approach to distinguish DNA from RNA binding surfaces , 2011, Nucleic acids research.

[5]  N. Go,et al.  Amino acid residue doublet propensity in the protein–RNA interface and its application to RNA interface prediction , 2006, Nucleic acids research.

[6]  Liangjiang Wang,et al.  BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences , 2006, Nucleic Acids Res..

[7]  Jonathan J. Ellis,et al.  Protein–RNA interactions: Structural analysis and functional classes , 2006, Proteins.

[8]  Yael Mandel-Gutfreund,et al.  Classifying RNA-Binding Proteins Based on Electrostatic Properties , 2008, PLoS Comput. Biol..

[9]  Jae-Hyung Lee,et al.  RNABindR: a server for analyzing and predicting RNA-binding sites in proteins , 2007, Nucleic Acids Res..

[10]  Fabian Glaser,et al.  Predicting nucleic acid binding interfaces from structural models of proteins , 2012, Proteins.

[11]  L. Perez-Cano,et al.  Optimal protein‐RNA area, OPRA: A propensity‐based method to identify RNA‐binding sites on proteins , 2010, Proteins.

[12]  Kyungsook Han,et al.  Prediction of RNA-Binding Residues in Proteins Using the Interaction Propensities of Amino Acids and Nucleotides , 2008, ICIC.

[13]  Yael Mandel-Gutfreund,et al.  Patch Finder Plus (PFplus): A web server for extracting and displaying positive electrostatic patches on protein surfaces , 2007, Nucleic Acids Res..

[14]  Zhi-Ping Liu,et al.  Prediction of protein-RNA binding sites by a random forest method with combined features , 2010, Bioinform..

[15]  B. Rost,et al.  Conservation and prediction of solvent accessibility in protein families , 1994, Proteins.

[16]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[17]  Yael Mandel-Gutfreund,et al.  Annotating nucleic acid-binding function based on protein structure. , 2003, Journal of molecular biology.

[18]  T. Glisovic,et al.  RNA‐binding proteins and post‐transcriptional gene regulation , 2008, FEBS letters.

[19]  J. Janin,et al.  Dissecting protein–RNA recognition sites , 2008, Nucleic acids research.

[20]  Vasant Honavar,et al.  Struct-NB: predicting protein-RNA binding sites using structural features , 2010, Int. J. Data Min. Bioinform..

[21]  Yaoqi Zhou,et al.  Structure-based prediction of RNA-binding domains and RNA-binding sites and application to structural genomics targets , 2010, Nucleic acids research.

[22]  Yaoqi Zhou,et al.  Structure-based prediction of DNA-binding proteins by structural alignment and a volume-fraction corrected DFIRE-based energy function , 2010, Bioinform..

[23]  M. Šikić,et al.  PSAIA – Protein Structure and Interaction Analyzer , 2008, BMC Structural Biology.

[24]  Gabriele Varani,et al.  Protein families and RNA recognition , 2005, The FEBS journal.

[25]  L. Hellman,et al.  Electrophoretic mobility shift assay (EMSA) for detecting protein–nucleic acid interactions , 2007, Nature Protocols.

[26]  Yaoqi Zhou,et al.  A new size‐independent score for pairwise protein structure alignment and its application to structure classification and nucleic‐acid binding prediction , 2012, Proteins.

[27]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[28]  J. Iwakiri,et al.  Dissecting the protein–RNA interface: the role of protein surface shapes and RNA secondary structures in protein–RNA recognition , 2011, Nucleic acids research.

[29]  Zheng Yuan,et al.  Exploiting structural and topological information to improve prediction of RNA-protein binding sites , 2009, BMC Bioinformatics.

[30]  A. Shelat,et al.  Assay Optimization and Screening of RNA-Protein Interactions by AlphaScreen , 2007, Journal of biomolecular screening.

[31]  Jack Y. Yang,et al.  BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features , 2010, BMC Systems Biology.

[32]  Jernej Ule,et al.  CLIP: a method for identifying protein-RNA interaction sites in living cells. , 2005, Methods.

[33]  Yuedong Yang,et al.  Prediction of RNA binding proteins comes of age from low resolution to high resolution. , 2013, Molecular bioSystems.

[34]  Hui Lu,et al.  NAPS: a residue-level nucleic acid-binding prediction server , 2010, Nucleic Acids Res..

[35]  R. Graham,et al.  Predicting RNA-binding sites from the protein structure based on electrostatics, evolution and geometry , 2008, Nucleic acids research.

[36]  Jianjun Hu,et al.  DNABind: A hybrid algorithm for structure‐based prediction of DNA‐binding residues by combining machine learning‐ and template‐based approaches , 2013, Proteins.

[37]  Yuedong Yang,et al.  Highly accurate and high-resolution function prediction of RNA binding proteins by fold recognition and binding affinity prediction , 2011, RNA biology.

[38]  Jeffrey Skolnick,et al.  DBD-Hunter: a knowledge-based method for the prediction of DNA–protein interactions , 2008, Nucleic acids research.

[39]  Haruki Nakamura,et al.  PiRaNhA: a server for the computational prediction of RNA-binding residues in protein sequences , 2010, Nucleic Acids Res..

[40]  I. Song,et al.  Working Set Selection Using Second Order Information for Training Svm, " Complexity-reduced Scheme for Feature Extraction with Linear Discriminant Analysis , 2022 .

[41]  Vasant Honavar,et al.  Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art , 2012, BMC Bioinformatics.

[42]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[43]  J. Bujnicki,et al.  Computational methods for prediction of protein-RNA interactions. , 2012, Journal of structural biology.

[44]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..