Accurate prediction of RNA-binding protein residues with two discriminative structural descriptors

BackgroundRNA-binding proteins participate in many important biological processes concerning RNA-mediated gene regulation, and several computational methods have been recently developed to predict the protein-RNA interactions of RNA-binding proteins. Newly developed discriminative descriptors will help to improve the prediction accuracy of these prediction methods and provide further meaningful information for researchers.ResultsIn this work, we designed two structural features (residue electrostatic surface potential and triplet interface propensity) and according to the statistical and structural analysis of protein-RNA complexes, the two features were powerful for identifying RNA-binding protein residues. Using these two features and other excellent structure- and sequence-based features, a random forest classifier was constructed to predict RNA-binding residues. The area under the receiver operating characteristic curve (AUC) of five-fold cross-validation for our method on training set RBP195 was 0.900, and when applied to the test set RBP68, the prediction accuracy (ACC) was 0.868, and the F-score was 0.631.ConclusionsThe good prediction performance of our method revealed that the two newly designed descriptors could be discriminative for inferring protein residues interacting with RNAs. To facilitate the use of our method, a web-server called RNAProSite, which implements the proposed method, was constructed and is freely available at http://lilab.ecust.edu.cn/NABind.

[1]  H. Scheraga,et al.  Folding of polypeptide chains in proteins: a proposed mechanism for folding. , 1971, Proceedings of the National Academy of Sciences of the United States of America.

[2]  B. Lee,et al.  The interpretation of protein structures: estimation of static accessibility. , 1971, Journal of molecular biology.

[3]  H. Scheraga,et al.  Statistical mechanical treatment of protein conformation. II. A three-state model for specific-sequence copolymers of amino acids. , 1976, Macromolecules.

[4]  H. Scheraga,et al.  Statistical mechanical treatment of protein conformation. 5. A multistate model for specific-sequence copolymers of amino acids. , 1977, Macromolecules.

[5]  S. Rackovsky,et al.  Characterization of multiple bends in proteins , 1980, Biopolymers.

[6]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[7]  M. Oobatake,et al.  Optimization of Amino Acid Parameters for Correspondence of Sequence to Tertiary Structures of Proteins (Commemoration Issue Dedicated to Professor Eiichi Fujita on the Occasion of his Retirement) , 1985 .

[8]  L. Kier,et al.  Amino acid side chain parameters for correlation studies in biology and pharmacology. , 2009, International journal of peptide and protein research.

[9]  J. Richardson,et al.  Amino acid preferences for specific locations at the ends of alpha helices. , 1988, Science.

[10]  K. Sharp,et al.  Electrical potential of transfer RNAs: codon-anticodon recognition. , 1990, Biochemistry.

[11]  A. Finkelstein,et al.  Physical reasons for secondary structure stability: alpha-helices in short peptides. , 1991, Proteins.

[12]  O. Ptitsyn,et al.  Physical reasons for secondary structure stability: α‐Helices in short peptides , 1991 .

[13]  K. Sharp,et al.  Accurate Calculation of Hydration Free Energies Using Macroscopic Solvent Models , 1994 .

[14]  I. Cosic Macromolecular bioactivity: is it resonant interaction between macromolecules?-theory and applications , 1994, IEEE Transactions on Biomedical Engineering.

[15]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[16]  B. K. Davis Evolution of the genetic code. , 1999, Progress in biophysics and molecular biology.

[17]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[18]  Y. Shamoo,et al.  Structure-based analysis of protein-RNA interactions using the program ENTANGLE. , 2001, Journal of molecular biology.

[19]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[20]  Nathan A. Baker,et al.  PDB2PQR: an automated pipeline for the setup of Poisson-Boltzmann electrostatics calculations , 2004, Nucleic Acids Res..

[21]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[22]  Shandar Ahmad,et al.  Qgrid: clustering tool for detecting charged and hydrophobic regions in proteins , 2004, Nucleic Acids Res..

[23]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[24]  Frances M. G. Pearl,et al.  The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis , 2004, Nucleic Acids Res..

[25]  D. Lejeune,et al.  Protein–nucleic acid recognition: Statistical analysis of atomic interactions and influence of DNA structure , 2005, Proteins.

[26]  Liangjiang Wang,et al.  BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences , 2006, Nucleic Acids Res..

[27]  N. Go,et al.  Amino acid residue doublet propensity in the protein–RNA interface and its application to RNA interface prediction , 2006, Nucleic acids research.

[28]  Vasant G Honavar,et al.  Prediction of RNA binding sites in proteins from amino acid sequence. , 2006, RNA.

[29]  N. Morozova,et al.  Protein-RNA interactions: exploring binding patterns with a three-dimensional superposition analysis of high resolution structures , 2006, Bioinform..

[30]  Susan J. Brown,et al.  Prediction of RNA-Binding Residues in Protein Sequences Using Support Vector Machines , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[31]  BMC Bioinformatics , 2005 .

[32]  E. Obayashi,et al.  Crystallization of RNA-protein complexes. , 2007, Methods in molecular biology.

[33]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[34]  Jonathan J. Ellis,et al.  Protein–RNA interactions: Structural analysis and functional classes , 2006, Proteins.

[35]  Kai-Wei Chang,et al.  RNA-binding proteins in human genetic disease. , 2008, Trends in genetics : TIG.

[36]  L. Scott,et al.  RNA structure determination by NMR. , 2008, Methods in molecular biology.

[37]  Yao Chi Chen,et al.  Predicting RNA-binding sites from the protein structure based on electrostatics, evolution and geometry , 2008, Nucleic acids research.

[38]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[39]  Peng Jiang,et al.  RISP: A web-based server for prediction of RNA-binding sites in proteins , 2008, Comput. Methods Programs Biomed..

[40]  Gajendra P.S. Raghava,et al.  Prediction of RNA binding sites in a protein using SVM and PSSM profile , 2008, Proteins.

[41]  J. Janin,et al.  Dissecting protein–RNA recognition sites , 2008, Nucleic acids research.

[42]  T. Glisovic,et al.  RNA‐binding proteins and post‐transcriptional gene regulation , 2008, FEBS letters.

[43]  Mark Johnson,et al.  NCBI BLAST: a better web interface , 2008, Nucleic Acids Res..

[44]  Dan Wu,et al.  Priorities for nucleotide trace, sequence and annotation data capture at the Ensembl Trace Archive and the EMBL Nucleotide Sequence Database , 2007, Nucleic Acids Res..

[45]  Wen-Lian Hsu,et al.  Predicting RNA-binding sites of proteins using support vector machines and evolutionary information , 2008, BMC Bioinformatics.

[46]  Zheng Yuan,et al.  Exploiting structural and topological information to improve prediction of RNA-protein binding sites , 2009, BMC Bioinformatics.

[47]  Kristian Vlahovicek,et al.  Prediction of Protein–Protein Interaction Sites in Sequences and 3D Structures by Random Forests , 2009, PLoS Comput. Biol..

[48]  Haruki Nakamura,et al.  Protein function annotation from sequence: prediction of residues interacting with RNA , 2009, Bioinform..

[49]  Vasant Honavar,et al.  Struct-NB: predicting protein-RNA binding sites using structural features , 2010, Int. J. Data Min. Bioinform..

[50]  Hui Lu,et al.  NAPS: a residue-level nucleic acid-binding prediction server , 2010, Nucleic Acids Res..

[51]  L. Perez-Cano,et al.  Optimal protein‐RNA area, OPRA: A propensity‐based method to identify RNA‐binding sites on proteins , 2010, Proteins.

[52]  Yu-Feng Huang,et al.  Predicting RNA-binding residues from evolutionary information and sequence conservation , 2010, BMC Genomics.

[53]  Tuo Zhang,et al.  Analysis and prediction of RNA-binding residues using sequence, evolutionary conservation, and predicted secondary structure and solvent accessibility. , 2010, Current protein & peptide science.

[54]  Zhi-Ping Liu,et al.  Prediction of protein-RNA binding sites by a random forest method with combined features , 2010, Bioinform..

[55]  Meng-long Li,et al.  Identification of RNA-binding sites in proteins by integrating various sequence information , 2010, Amino Acids.

[56]  Jack Y. Yang,et al.  BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features , 2010, BMC Systems Biology.

[57]  Xin Ma,et al.  Prediction of RNA‐binding residues in proteins from primary sequence using an enriched random forest model with a novel hybrid feature , 2011, Proteins.

[58]  M. Gribskov,et al.  The role of RNA sequence and structure in RNA--protein interactions. , 2011, Journal of molecular biology.

[59]  Quan Pan,et al.  Identification of protein-RNA interaction sites using the information of spatial adjacent residues , 2011, Proteome Science.

[60]  Yaoqi Zhou,et al.  Structure-based prediction of RNA-binding domains and RNA-binding sites and application to structural genomics targets , 2010, Nucleic acids research.

[61]  E. Westhof,et al.  Classification of pseudo pairs between nucleotide bases and amino acids by analysis of nucleotide–protein complexes , 2011, Nucleic acids research.

[62]  Shula Shazman,et al.  From face to interface recognition: a differential geometric approach to distinguish DNA from RNA binding surfaces , 2011, Nucleic acids research.

[63]  P. Suganthan,et al.  AFP-Pred: A random forest approach for predicting antifreeze proteins from sequence-derived properties. , 2011, Journal of theoretical biology.

[64]  Vasant Honavar,et al.  Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art , 2012, BMC Bioinformatics.

[65]  J. Iwakiri,et al.  Dissecting the protein–RNA interface: the role of protein surface shapes and RNA secondary structures in protein–RNA recognition , 2011, Nucleic acids research.

[66]  Kyung Choi,et al.  Physicochemical property consensus sequences for functional analysis, design of multivalent antigens and targeted antivirals , 2012, BMC Bioinformatics.

[67]  Yaoqi Zhou,et al.  A new size‐independent score for pairwise protein structure alignment and its application to structure classification and nucleic‐acid binding prediction , 2012, Proteins.

[68]  J. Ule,et al.  Protein–RNA interactions: new genomic technologies and perspectives , 2012, Nature Reviews Genetics.

[69]  M. Ascano,et al.  Multi-disciplinary methods to define RNA-protein interactions and regulatory networks. , 2013, Current opinion in genetics & development.

[70]  Jon D. Wright,et al.  Identifying RNA-binding residues based on evolutionary conserved structural and energetic features , 2013, Nucleic acids research.

[71]  Hailong Zhu,et al.  Predicting protein functions using incomplete hierarchical labels , 2015, BMC Bioinformatics.

[72]  Jonathan D Wren,et al.  Proceedings of the 2015 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference , 2015, BMC Bioinformatics.

[73]  Ying Shen,et al.  RNA-binding residues prediction using structural features , 2015, BMC Bioinformatics.

[74]  Zhichao Miao,et al.  Prediction of nucleic acid binding probability in proteins: a neighboring residue network based score , 2015, Nucleic acids research.

[75]  Yi-Lin Chen,et al.  Obtaining long 16S rDNA sequences using multiple primers and its application on dioxin-containing samples , 2015, BMC Bioinformatics.

[76]  Eric Westhof,et al.  A Large-Scale Assessment of Nucleic Acids Binding Site Prediction Programs , 2015, PLoS Comput. Biol..