Protein-specific prediction of mRNA binding using RNA sequences, binding motifs and predicted secondary structures

BackgroundRNA-binding proteins interact with specific RNA molecules to regulate important cellular processes. It is therefore necessary to identify the RNA interaction partners in order to understand the precise functions of such proteins. Protein-RNA interactions are typically characterized using in vivo and in vitro experiments but these may not detect all binding partners. Therefore, computational methods that capture the protein-dependent nature of such binding interactions could help to predict potential binding partners in silico.ResultsWe have developed three methods to predict whether an RNA can interact with a particular RNA-binding protein using support vector machines and different features based on the sequence (the Oli method), the motif score (the OliMo method) and the secondary structure (the OliMoSS method). We applied these approaches to different experimentally-derived datasets and compared the predictions with RNAcontext and RPISeq. Oli outperformed OliMoSS and RPISeq, confirming our protein-specific predictions and suggesting that tetranucleotide frequencies are appropriate discriminative features. Oli and RNAcontext were the most competitive methods in terms of the area under curve. A precision-recall curve analysis achieved higher precision values for Oli. On a second experimental dataset including real negative binding information, Oli outperformed RNAcontext with a precision of 0.73 vs. 0.59.ConclusionsOur experiments showed that features based on primary sequence information are sufficiently discriminating to predict specific RNA-protein interactions. Sequence motifs and secondary structure information were not necessary to improve these predictions. Finally we confirmed that protein-specific experimental data concerning RNA-protein interactions are valuable sources of information that can be used for the efficient training of models for in silico predictions. The scripts are available upon request to the corresponding author.

[1]  L. Gold,et al.  Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. , 1990, Science.

[2]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[3]  D. Draper,et al.  Protein-RNA recognition. , 1995, Annual review of biochemistry.

[4]  M. Summers,et al.  Protein–RNA recognition , 1998, Biopolymers.

[5]  D. Draper Themes in RNA-protein recognition. , 1999, Journal of molecular biology.

[6]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[7]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[8]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[9]  Satoru Miyano,et al.  A neural network method for identification of RNA-interacting residues in protein. , 2004, Genome informatics. International Conference on Genome Informatics.

[10]  Annette M. Molinaro,et al.  Prediction error estimation: a comparison of resampling methods , 2005, Bioinform..

[11]  Liangjiang Wang,et al.  BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences , 2006, Nucleic Acids Res..

[12]  Vasant G Honavar,et al.  Prediction of RNA binding sites in proteins from amino acid sequence. , 2006, RNA.

[13]  Dimitris Kanellopoulos,et al.  Handling imbalanced datasets: A review , 2006 .

[14]  Michele Caselle,et al.  Identification of candidate regulatory sequences in mammalian 3' UTRs by statistical analysis of oligonucleotide distributions , 2007, BMC Bioinformatics.

[15]  Frédéric H.-T. Allain,et al.  Sequence-specific binding of single-stranded RNA: is there a code for recognition? , 2006, Nucleic acids research.

[16]  David Haussler,et al.  The UCSC genome browser database: update 2007 , 2006, Nucleic Acids Res..

[17]  T. Glisovic,et al.  RNA‐binding proteins and post‐transcriptional gene regulation , 2008, FEBS letters.

[18]  Wen-Lian Hsu,et al.  Predicting RNA-binding sites of proteins using support vector machines and evolutionary information , 2008, BMC Bioinformatics.

[19]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[20]  Richong Zhang,et al.  An information gain-based approach for recommending useful product reviews , 2011, Knowledge and Information Systems.

[21]  Scott B. Dewell,et al.  Transcriptome-wide Identification of RNA-Binding Protein and MicroRNA Target Sites by PAR-CLIP , 2010, Cell.

[22]  Quaid Morris,et al.  Predicting in vivo binding sites of RNA-binding proteins using mRNA secondary structure. , 2010, RNA.

[23]  Gautier Koscielny,et al.  Ensembl’s 10th year , 2009, Nucleic Acids Res..

[24]  Zhi-Ping Liu,et al.  Prediction of protein-RNA binding sites by a random forest method with combined features , 2010, Bioinform..

[25]  Meng-long Li,et al.  Identification of RNA-binding sites in proteins by integrating various sequence information , 2010, Amino Acids.

[26]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[27]  Kenji Mizuguchi,et al.  On nucleotide solvent accessibility in RNA structure. , 2010, Gene.

[28]  Quaid Morris,et al.  RNAcontext: A New Method for Learning the Sequence and Structure Binding Preferences of RNA-Binding Proteins , 2010, PLoS Comput. Biol..

[29]  M. Zavolan,et al.  A quantitative analysis of CLIP methods for identifying binding sites of RNA-binding proteins , 2011, Nature Methods.

[30]  Mary Goldman,et al.  The UCSC Genome Browser database: update 2011 , 2010, Nucleic Acids Res..

[31]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[32]  R. Darnell,et al.  Mapping in vivo protein-RNA interactions at single-nucleotide resolution from HITS-CLIP data , 2011, Nature Biotechnology.

[33]  E. Westhof,et al.  The endless subtleties of RNA-protein complexes. , 2011, Structure.

[34]  Uwe Ohler,et al.  Integrative regulatory mapping indicates that the RNA-binding protein HuR couples pre-mRNA processing and mRNA stability. , 2011, Molecular cell.

[35]  Jianhua Ruan,et al.  Genomic Analyses of the RNA-binding Protein Hu Antigen R (HuR) Identify a Complex Network of Target Genes and Novel Characteristics of Its Binding Sites* , 2011, The Journal of Biological Chemistry.

[36]  Federico Agostini,et al.  Predicting protein associations with long noncoding RNAs , 2011, Nature Methods.

[37]  Ahmad M Khalil,et al.  RNA-protein interactions in human health and disease. , 2011, Seminars in cell & developmental biology.

[38]  J. Bähler,et al.  In silico characterization and prediction of global protein–mRNA interactions in yeast , 2011, Nucleic acids research.

[39]  Michele Magrane,et al.  UniProt Knowledgebase: a hub of integrated protein data , 2011, Database J. Biol. Databases Curation.

[40]  S. Tenenbaum,et al.  RIP-Chip analysis: RNA-Binding Protein Immunoprecipitation-Microarray (Chip) Profiling. , 2011, Methods in molecular biology.

[41]  Shandar Ahmad,et al.  Prediction of dinucleotide-specific RNA-binding sites in proteins , 2011, BMC Bioinformatics.

[42]  Vasant Honavar,et al.  Predicting RNA-Protein Interactions Using Only Sequence Information , 2011, BMC Bioinformatics.

[43]  Jernej Ule,et al.  The RNA-binding landscapes of two SR proteins reveal unique functions and binding to diverse RNA classes , 2012, Genome Biology.

[44]  Mihaela Zavolan,et al.  Argonaute CLIP--a method to identify in vivo targets of miRNAs. , 2012, Methods.

[45]  K. Neugebauer,et al.  RNA-protein interactions in vivo: global gets specific. , 2012, Trends in biochemical sciences.

[46]  Angela Re,et al.  AURA: Atlas of UTR Regulatory Activity , 2012, Bioinform..

[47]  Drena Dobbs,et al.  Computational Tools for Investigating RNA-Protein Interaction Partners , 2013 .

[48]  Thomas Tuschl,et al.  Structure-function studies of STAR family Quaking proteins bound to their in vivo RNA target sites. , 2013, Genes & development.

[49]  Xiang-Sun Zhang,et al.  De novo prediction of RNA-protein interactions from sequence information. , 2013, Molecular bioSystems.