Quantifying sequence and structural features of protein–RNA interactions

Increasing awareness of the importance of protein–RNA interactions has motivated many approaches to predict residue-level RNA binding sites in proteins based on sequence or structural characteristics. Sequence-based predictors are usually high in sensitivity but low in specificity; conversely structure-based predictors tend to have high specificity, but lower sensitivity. Here we quantified the contribution of both sequence- and structure-based features as indicators of RNA-binding propensity using a machine-learning approach. In order to capture structural information for proteins without a known structure, we used homology modeling to extract the relevant structural features. Several novel and modified features enhanced the accuracy of residue-level RNA-binding propensity beyond what has been reported previously, including by meta-prediction servers. These features include: hidden Markov model-based evolutionary conservation, surface deformations based on the Laplacian norm formalism, and relative solvent accessibility partitioned into backbone and side chain contributions. We constructed a web server called aaRNA that implements the proposed method and demonstrate its use in identifying putative RNA binding sites.

[1]  E. Westhof,et al.  Classification of pseudo pairs between nucleotide bases and amino acids by analysis of nucleotide–protein complexes , 2011, Nucleic acids research.

[2]  Yaoqi Zhou,et al.  Structure-based prediction of RNA-binding domains and RNA-binding sites and application to structural genomics targets , 2010, Nucleic acids research.

[3]  T. Glisovic,et al.  RNA‐binding proteins and post‐transcriptional gene regulation , 2008, FEBS letters.

[4]  J. R. Lobry,et al.  SeqinR 1.0-2: A Contributed Package to the R Project for Statistical Computing Devoted to Biological Sequences Retrieval and Analysis , 2007 .

[5]  Vasant Honavar,et al.  Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art , 2012, BMC Bioinformatics.

[6]  Thomas A. Hopf,et al.  Three-Dimensional Structures of Membrane Proteins from Genomic Sequencing , 2012, Cell.

[7]  Jack Y. Yang,et al.  BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features , 2010, BMC Systems Biology.

[8]  Vasant Honavar,et al.  PRIDB: a protein–RNA interface database , 2010, Nucleic Acids Res..

[9]  Michele Vendruscolo,et al.  Structural Approaches to Sequence Evolution , 2007 .

[10]  E Westhof,et al.  Statistical analysis of atomic contacts at RNA–protein interfaces , 2001, Journal of molecular recognition : JMR.

[11]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[12]  Johannes Söding,et al.  The HHpred interactive server for protein homology detection and structure prediction , 2005, Nucleic Acids Res..

[13]  Daniel Herschlag,et al.  Diverse RNA-Binding Proteins Interact with Functionally Related Sets of RNAs, Suggesting an Extensive Regulatory System , 2008, PLoS biology.

[14]  Abhijit A. Patel,et al.  Splicing double: insights from the second spliceosome , 2003, Nature Reviews Molecular Cell Biology.

[15]  E. Westhof,et al.  Base pairs and pseudo pairs observed in RNA–ligand complexes , 2009, Journal of molecular recognition : JMR.

[16]  Richard J. Kuhn,et al.  Structure of the Flavivirus Helicase: Implications for Catalytic Activity, Protein Interactions, and Proteolytic Processing , 2005, Journal of Virology.

[17]  S. Jones,et al.  Protein-RNA interactions: a structural analysis. , 2001, Nucleic acids research.

[18]  Brice Felden,et al.  RNA structure: experimental analysis. , 2007, Current opinion in microbiology.

[19]  Pierre-François Marteau,et al.  LNA: Fast Protein Structural Comparison Using a Laplacian Characterization of Tertiary Structure , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[20]  Gajendra P.S. Raghava,et al.  Prediction of RNA binding sites in a protein using SVM and PSSM profile , 2008, Proteins.

[21]  Akira R. Kinjo,et al.  Bridging the gap between single-template and fragment based protein structure modeling using Spanner , 2011 .

[22]  Shandar Ahmad,et al.  Prediction of dinucleotide-specific RNA-binding sites in proteins , 2011, BMC Bioinformatics.

[23]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[24]  Donny D. Licatalosi,et al.  RNA processing and its regulation: global insights into biological networks , 2010, Nature Reviews Genetics.

[25]  Jae-Hyung Lee,et al.  RNABindR: a server for analyzing and predicting RNA-binding sites in proteins , 2007, Nucleic Acids Res..

[26]  Xin Ma,et al.  Prediction of RNA‐binding residues in proteins from primary sequence using an enriched random forest model with a novel hybrid feature , 2011, Proteins.

[27]  Haruki Nakamura,et al.  Zc3h12a is an RNase essential for controlling immune responses by regulating mRNA decay , 2009, Nature.

[28]  J. Thornton,et al.  Satisfying hydrogen bonding potential in proteins. , 1994, Journal of molecular biology.

[29]  Haruki Nakamura,et al.  PiRaNhA: a server for the computational prediction of RNA-binding residues in protein sequences , 2010, Nucleic Acids Res..

[30]  N. Go,et al.  Amino acid residue doublet propensity in the protein–RNA interface and its application to RNA interface prediction , 2006, Nucleic acids research.

[31]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[32]  Carles Pons,et al.  Pacific Symposium on Biocomputing 15:269-280(2010) STRUCTURAL PREDICTION OF PROTEIN-RNA INTERACTION BY COMPUTATIONAL DOCKING WITH PROPENSITY-BASED STATISTICAL POTENTIALS , 2022 .

[33]  Vasant G Honavar,et al.  Prediction of RNA binding sites in proteins from amino acid sequence. , 2006, RNA.

[34]  M. Saraste,et al.  FEBS Lett , 2000 .

[35]  V. Ramakrishnan,et al.  Ribosomal protein structures: insights into the architecture, machinery and evolution of the ribosome. , 1998, Trends in biochemical sciences.

[36]  M. Gribskov,et al.  The role of RNA sequence and structure in RNA--protein interactions. , 2011, Journal of molecular biology.

[37]  J. Bujnicki,et al.  Computational methods for prediction of protein-RNA interactions. , 2012, Journal of structural biology.

[38]  Jonathan J. Ellis,et al.  Protein–RNA interactions: Structural analysis and functional classes , 2006, Proteins.

[39]  Chenghua Shao,et al.  Trendspotting in the Protein Data Bank , 2013, FEBS letters.