Mining graph patterns in the protein-RNA interfaces

Protein-RNA interactions play important roles in the biological systems. The goal of this study is to discover structural patterns in the protein-RNA interfaces that contribute the affinity of the interactions. We represented known protein-RNA interfaces using graphs and then identify common subgraphs enriched in the interfaces. Comparison of the discovered graph patterns with UniProt annotations showed that the graph patterns had a significant overlap with residue sites that had been proven by experimental methods to be crucial for RNA bindings. Using 200 patterns as input features, a Support Vector Machine method was able to classify protein surface patches into RNA-binding sites and non-RNA-biding sites with 84.0% accuracy and 88.9% precision. We built a simple scoring function that calculated the total number of the graph patterns that occurred in a protein-RNA interface. That scoring function was able to discriminate near native protein-RNA complexes from docking decoys with a performance comparable with a state-of-the-art complex scoring function.

[1]  Xiaoqin Zou,et al.  A knowledge-based scoring function for protein-RNA interactions derived from a statistical mechanics-based iterative method , 2014, Nucleic acids research.

[2]  Janet M Thornton,et al.  Using electrostatic potentials to predict DNA-binding sites on DNA-binding proteins. , 2003, Nucleic acids research.

[3]  Carles Pons,et al.  Pacific Symposium on Biocomputing 15:269-280(2010) STRUCTURAL PREDICTION OF PROTEIN-RNA INTERACTION BY COMPUTATIONAL DOCKING WITH PROPENSITY-BASED STATISTICAL POTENTIALS , 2022 .

[4]  Gajendra P.S. Raghava,et al.  Prediction of RNA binding sites in a protein using SVM and PSSM profile , 2008, Proteins.

[5]  Rong Liu,et al.  RBRDetector: Improved prediction of binding residues on RNA‐binding protein structures using complementary feature‐ and template‐based strategies , 2014, Proteins.

[6]  Vasant G Honavar,et al.  Prediction of RNA binding sites in proteins from amino acid sequence. , 2006, RNA.

[7]  L. Perez-Cano,et al.  Optimal protein‐RNA area, OPRA: A propensity‐based method to identify RNA‐binding sites on proteins , 2010, Proteins.

[8]  Yangyu Huang,et al.  A novel protocol for three-dimensional structure prediction of RNA-protein complexes , 2013, Scientific Reports.

[9]  Haruki Nakamura,et al.  Protein function annotation from sequence: prediction of residues interacting with RNA , 2009, Bioinform..

[10]  Zhi-Ping Liu,et al.  Prediction of protein-RNA binding sites by a random forest method with combined features , 2010, Bioinform..

[11]  Xiaoqin Zou,et al.  A nonredundant structure dataset for benchmarking protein‐RNA computational docking , 2013, J. Comput. Chem..

[12]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[13]  Jon D. Wright,et al.  Identifying RNA-binding residues based on evolutionary conserved structural and energetic features , 2013, Nucleic acids research.

[14]  Mario Vento,et al.  A (sub)graph isomorphism algorithm for matching large graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[16]  Jae-Hyung Lee,et al.  RNABindR: a server for analyzing and predicting RNA-binding sites in proteins , 2007, Nucleic Acids Res..

[17]  J. Su,et al.  A new residue‐nucleotide propensity potential with structural information considered for discriminating protein‐RNA docking decoys , 2012, Proteins.

[18]  Liangjiang Wang,et al.  BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences , 2006, Nucleic Acids Res..

[19]  María Martín,et al.  The Universal Protein Resource (UniProt) in 2010 , 2010 .

[20]  Yu-dong Cai,et al.  Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence. , 2003, Biochimica et biophysica acta.

[21]  Satoru Miyano,et al.  A neural network method for identification of RNA-interacting residues in protein. , 2004, Genome informatics. International Conference on Genome Informatics.

[22]  Mario Vento,et al.  An Improved Algorithm for Matching Large Graphs , 2001 .

[23]  Haruki Nakamura,et al.  PiRaNhA: a server for the computational prediction of RNA-binding residues in protein sequences , 2010, Nucleic Acids Res..

[24]  Yaoqi Zhou,et al.  Structure-based prediction of RNA-binding domains and RNA-binding sites and application to structural genomics targets , 2010, Nucleic acids research.