rpiCOOL: A tool for In Silico RNA-protein interaction detection using random forest.

Understanding the principle of RNA-protein interactions (RPIs) is of critical importance to provide insights into post-transcriptional gene regulation and is useful to guide studies about many complex diseases. The limitations and difficulties associated with experimental determination of RPIs, call an urgent need to computational methods for RPI prediction. In this paper, we proposed a machine learning method to detect RNA-protein interactions based on sequence information. We used motif information and repetitive patterns, which have been extracted from experimentally validated RNA-protein interactions, in combination with sequence composition as descriptors to build a model to RPI prediction via a random forest classifier. About 20% of the "sequence motifs" and "nucleotide composition" features have been selected as the informative features with the feature selection methods. These results suggest that these two feature types contribute effectively in RPI detection. Results of 10-fold cross-validation experiments on three non-redundant benchmark datasets show a better performance of the proposed method in comparison with the current state-of-the-art methods in terms of various performance measures. In addition, the results revealed that the accuracy of the RPI prediction methods could vary considerably across different organisms. We have implemented the proposed method, namely rpiCOOL, as a stand-alone tool with a user friendly graphical user interface (GUI) that enables the researchers to predict RNA-protein interaction. The rpiCOOL is freely available at http://biocool.ir/rpicool.html for non-commercial uses.

[1]  Brendan J. Frey,et al.  A compendium of RNA-binding motifs for decoding gene regulation , 2013, Nature.

[2]  Reza Ebrahimpour,et al.  LocFuse: human protein-protein interaction prediction via classifier fusion using protein localization information. , 2014, Genomics.

[3]  Renato Paro,et al.  Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data , 2015, BMC Bioinformatics.

[4]  Xiang-Sun Zhang,et al.  De novo prediction of RNA-protein interactions from sequence information. , 2013, Molecular bioSystems.

[5]  Jernej Ule,et al.  CLIP: a method for identifying protein-RNA interaction sites in living cells. , 2005, Methods.

[6]  Laura Pérez-Cano,et al.  Dissection and prediction of RNA-binding sites on proteins , 2010, Biomolecular concepts.

[7]  Javad Zahiri,et al.  Computational Prediction of Protein–Protein Interaction Networks: Algo-rithms and Resources , 2013, Current genomics.

[8]  J. Bujnicki,et al.  Computational methods for prediction of protein-RNA interactions. , 2012, Journal of structural biology.

[9]  Peng Jiang,et al.  Computational Assessment of the Cooperativity between RNA Binding Proteins and MicroRNAs in Transcript Decay , 2013, PLoS Comput. Biol..

[10]  Lili Wan,et al.  RNA and Disease , 2009, Cell.

[11]  Mohsen Khorshid,et al.  CLIPZ: a database and analysis environment for experimentally determined binding sites of RNA-binding proteins , 2010, Nucleic Acids Res..

[12]  A. Shelat,et al.  Assay Optimization and Screening of RNA-Protein Interactions by AlphaScreen , 2007, Journal of biomolecular screening.

[13]  Reza Ebrahimpour,et al.  Predicting protein-protein interactions between human and hepatitis C virus via an ensemble learning method. , 2014, Molecular bioSystems.

[14]  Wei Wu,et al.  NPInter v2.0: an updated database of ncRNA interactions , 2013, Nucleic Acids Res..

[15]  Mei Liu,et al.  Prediction of protein-protein interactions using random decision forest framework , 2005, Bioinform..

[16]  V. Suresh,et al.  RPI-Pred: predicting ncRNA-protein interaction using sequence and structural information , 2015, Nucleic acids research.

[17]  G. Dreyfuss,et al.  RNA-binding proteins as regulators of gene expression. , 1997, Current opinion in genetics & development.

[18]  Vasant Honavar,et al.  PRIDB: a protein–RNA interface database , 2010, Nucleic Acids Res..

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  Uwe Ohler,et al.  PARalyzer: definition of RNA binding sites from PAR-CLIP short-read sequence data , 2011, Genome Biology.

[21]  Kai-Wei Chang,et al.  RNA-binding proteins in human genetic disease. , 2008, Trends in genetics : TIG.

[22]  Vasant Honavar,et al.  Predicting RNA-Protein Interactions Using Only Sequence Information , 2011, BMC Bioinformatics.

[23]  T. Glisovic,et al.  RNA‐binding proteins and post‐transcriptional gene regulation , 2008, FEBS letters.

[24]  L. Hellman,et al.  Electrophoretic mobility shift assay (EMSA) for detecting protein–nucleic acid interactions , 2007, Nature Protocols.

[25]  J. Bähler,et al.  In silico characterization and prediction of global protein–mRNA interactions in yeast , 2011, Nucleic acids research.

[26]  Hongyu Miao,et al.  Prediction of Protein-RNA Interactions Using Sequence and Structure Descriptors , 2015 .

[27]  K. Hall,et al.  RNA-protein interactions. , 2002, Current opinion in structural biology.

[28]  Norman E. Davey,et al.  Insights into RNA Biology from an Atlas of Mammalian mRNA-Binding Proteins , 2012, Cell.

[29]  Gabriele Varani,et al.  Protein families and RNA recognition , 2005, The FEBS journal.

[30]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[31]  Renato Paro,et al.  Mixture models and wavelet transforms reveal high confidence RNA-protein interaction sites in MOV10 PAR-CLIP data , 2012, Nucleic acids research.

[32]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[33]  Jürg Bähler,et al.  Post-transcriptional control of gene expression: a genome-wide perspective. , 2005, Trends in biochemical sciences.

[34]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[35]  Zhi-Ping Liu,et al.  Prediction of protein-RNA binding sites by a random forest method with combined features , 2010, Bioinform..