aPRBind: protein-RNA interface prediction by combining sequence and I-TASSER model-based structural features learned with convolutional neural networks

MOTIVATION Protein-RNA interactions play a critical role in various biological processes. The accurate prediction of RNA-binding residues in proteins has been one of the most challenging and intriguing problems in the field of computational biology. The existing methods still have a relatively low accuracy especially for the sequence based ab-initio methods. RESULTS In this work, we propose an approach aPRBind, a convolutional neural network (CNN)-based ab-initio method for RNA-binding residue prediction. aPRBind is trained with sequence features and structural ones (particularly including residue dynamics information and residue-nucleotide propensity developed by us) that are extracted from the predicted structures by I-TASSER. The analysis of feature contributions indicates the sequence features are most important, followed by dynamics information, and the sequence and structural features are complementary in binding site prediction. The performance comparison of our method with other peer ones on benchmark dataset shows that aPRBind outperforms some state-of-the-art ab-initio methods. Additionally, aPRBind can give a better prediction for the modeled structures with TM-score ≥ 0.5, and meanwhile since the structural features are not very sensitive to the refined 3-dimensional structures, aPRBind has only a marginal dependence on the accuracy of the structure model, which allows aPRBind to be applied to the RNA-binding site prediction for the modeled or unbound structures. AVAILABILITY The source code is available at https://github.com/ChunhuaLiLab/aPRbind. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  L. Perez-Cano,et al.  Optimal protein‐RNA area, OPRA: A propensity‐based method to identify RNA‐binding sites on proteins , 2010, Proteins.

[2]  Vasant Honavar,et al.  Struct-NB: predicting protein-RNA binding sites using structural features , 2010, Int. J. Data Min. Bioinform..

[3]  Zixiang Wang,et al.  A boosting approach for prediction of protein-RNA binding residues , 2017, BMC Bioinformatics.

[4]  Zheng Yuan,et al.  Exploiting structural and topological information to improve prediction of RNA-protein binding sites , 2009, BMC Bioinformatics.

[5]  J. Su,et al.  A new residue‐nucleotide propensity potential with structural information considered for discriminating protein‐RNA docking decoys , 2012, Proteins.

[6]  Mona Singh,et al.  Predicting functionally important residues from sequence conservation , 2007, Bioinform..

[7]  Vasant Honavar,et al.  Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art , 2012, BMC Bioinformatics.

[8]  Zhen Yang,et al.  Analyses on clustering of the conserved residues at protein-RNA interfaces and its application in binding site identification , 2020, BMC Bioinformatics.

[9]  Kai-Wei Chang,et al.  RNA-binding proteins in human genetic disease. , 2008, Trends in genetics : TIG.

[10]  Daron M. Standley,et al.  Quantifying sequence and structural features of protein–RNA interactions , 2014, Nucleic acids research.

[11]  Wen-Lian Hsu,et al.  Predicting RNA-binding sites of proteins using support vector machines and evolutionary information , 2008, BMC Bioinformatics.

[12]  Rasna R. Walia,et al.  RNABindRPlus: A Predictor that Combines Machine Learning and Sequence Homology-Based Methods to Improve the Reliability of Predicted RNA-Binding Residues in Proteins , 2014, PloS one.

[13]  Kyungsook Han,et al.  Discovering the interaction propensities of amino acids and nucleotides from protein-RNA complexes. , 2003, Molecules and cells.

[14]  Vasant Honavar,et al.  FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues , 2016, PloS one.

[15]  A. Atilgan,et al.  Vibrational Dynamics of Folded Proteins: Significance of Slow and Fast Motions in Relation to Function and Stability , 1998 .

[16]  Anna Goldenberg,et al.  TensorFlow: Biology's Gateway to Deep Learning? , 2016, Cell systems.

[17]  J. Keene RNA regulons: coordination of post-transcriptional events , 2007, Nature Reviews Genetics.

[18]  D. Baker,et al.  A new hydrogen-bonding potential for the design of protein-RNA interactions predicts specific contacts and discriminates decoys. , 2004, Nucleic acids research.

[19]  M. Šikić,et al.  PSAIA – Protein Structure and Interaction Analyzer , 2008, BMC Structural Biology.

[20]  Lin Lu,et al.  A combinatorial scoring function for protein–RNA docking , 2017, Proteins.

[21]  Kuldip K. Paliwal,et al.  Capturing non‐local interactions by long short‐term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility , 2017, Bioinform..

[22]  Yang Zhang,et al.  The I-TASSER Suite: protein structure and function prediction , 2014, Nature Methods.

[23]  A. Atilgan,et al.  Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential. , 1997, Folding & design.

[24]  S. Jones,et al.  Protein-RNA interactions: a structural analysis. , 2001, Nucleic acids research.

[25]  Ozlem Keskin,et al.  Protein–DNA interactions: structural, thermodynamic and clustering patterns of conserved residues in DNA-binding proteins , 2008, Nucleic acids research.

[26]  Irina S. Moreira,et al.  A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces , 2016, International journal of molecular sciences.

[27]  Jae-Hyung Lee,et al.  RNABindR: a server for analyzing and predicting RNA-binding sites in proteins , 2007, Nucleic Acids Res..

[28]  Yang Zhang,et al.  How significant is a protein structure similarity with TM-score = 0.5? , 2010, Bioinform..

[29]  Mainak Guharoy,et al.  Conserved residue clusters at protein-protein interfaces and their use in binding site identification , 2010, BMC Bioinformatics.

[30]  Hui Lu,et al.  NAPS: a residue-level nucleic acid-binding prediction server , 2010, Nucleic Acids Res..

[31]  Mary E. Piper,et al.  Phleboviruses encapsidate their genomes by sequestering RNA bases , 2012, Proceedings of the National Academy of Sciences.

[32]  Yang Zhang,et al.  Interpreting the Dynamics of Binding Interactions of snRNA and U1A Using a Coarse-Grained Model. , 2019, Biophysical journal.

[33]  Yuan Tian,et al.  A phosphate-binding pocket within the platform-PAZ-connector helix cassette of human Dicer. , 2014, Molecular cell.

[34]  Gajendra P.S. Raghava,et al.  Prediction of RNA binding sites in a protein using SVM and PSSM profile , 2008, Proteins.

[35]  N. Go,et al.  Amino acid residue doublet propensity in the protein–RNA interface and its application to RNA interface prediction , 2006, Nucleic acids research.

[36]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[37]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[38]  A. Kitao,et al.  Dynamic profile analysis to characterize dynamics-driven allosteric sites in enzymes , 2016, Biophysics and physicobiology.

[39]  Jack Y. Yang,et al.  BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features , 2010, BMC Systems Biology.

[40]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Haruki Nakamura,et al.  PiRaNhA: a server for the computational prediction of RNA-binding residues in protein sequences , 2010, Nucleic Acids Res..