McBel-Plnc: A Deep Learning Model for Multiclass Multilabel Classification of Protein-lncRNA Interactions

One main function of long non-coding RNAs (lncRNAs) is to act as a scaffold facilitating multiple proteins to form complexes. Most of available prediction models for protein-RNA interactions, however, were proposed as a binary classifier, which limited on predicting the interaction between the non-coding RNAs and each individual RNA-binding protein (RBP). Hence, to predict if a lncRNA is acting as a scaffold, we consider this problem as a multiclass multilabel classification problem. To solve this problem, the high confident CLIP-seq data were selected from the POSTAR2 database with an augmentation of the data for the RBP classes with a small number of interacting lncRNAs. We then constructed a deep learning model for multiclass multilabel classification, called McBel-Plnc, based on the convolutional neural network (CNN) and long-short term memory (LSTM) using each of the five datasets randomly generated from the prepared data. Based on macro average, the test results showed the high precision of 0.9151 ± 0.0038 averaged from the five models with the lower recall of 0.5786 ± 0.0208. The small standard deviations confirmed the model stability. Comparing with iDeepE with a binary relevance method, iDeepE got the higher recall with the significantly lower precision (0.6912 and 0.1987, respectively). This result suggested that our model is competent to predict the protein-lncRNA interactions, especially with the lncRNAs targeted by multiple proteins. This suggested the potential to infer the insights of lncRNA functions and molecular mechanisms.

[1]  Christopher R. Sibley,et al.  iCLIP: Protein–RNA interactions at nucleotide resolution , 2014, Methods.

[2]  Javad Zahiri,et al.  rpiCOOL: A tool for In Silico RNA-protein interaction detection using random forest. , 2016, Journal of theoretical biology.

[3]  Azeddine Chikh,et al.  Comparative evaluation of four multi‐label classification algorithms in classifying learning objects , 2016, Comput. Appl. Eng. Educ..

[4]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[5]  Wei Wu,et al.  NPInter v2.0: an updated database of ncRNA interactions , 2013, Nucleic Acids Res..

[6]  Vasant Honavar,et al.  Predicting RNA-Protein Interactions Using Only Sequence Information , 2011, BMC Bioinformatics.

[7]  J. Rinn,et al.  Scaffold function of long non-coding RNA HOTAIR in protein ubiquitination , 2013, Nature Communications.

[8]  Gang Xu,et al.  POSTAR2: deciphering the post-transcriptional regulatory logics , 2018, Nucleic Acids Res..

[9]  M. Pertea,et al.  The Human Transcriptome: An Unfinished Story , 2012, Genes.

[10]  S. Koren,et al.  ScaffViz: visualizing metagenome assemblies , 2011, Genome Biology.

[11]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[12]  M. Zavolan,et al.  PAR-CLIP (Photoactivatable Ribonucleoside-Enhanced Crosslinking and Immunoprecipitation): a step-by-step protocol to the transcriptome-wide identification of binding sites of RNA-binding proteins. , 2014, Methods in enzymology.

[13]  Xing Chen,et al.  Long non-coding RNAs and complex diseases: from experimental results to computational models , 2016, Briefings Bioinform..

[14]  Hong-Bin Shen,et al.  Predicting RNA‐protein binding sites and motifs through combining local and global deep convolutional neural networks , 2018, Bioinform..

[15]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[16]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[17]  Jesse M. Engreitz,et al.  Long non-coding RNAs: spatial amplifiers that control nuclear structure and gene expression , 2016, Nature Reviews Molecular Cell Biology.

[18]  Uwe Ohler,et al.  PARalyzer: definition of RNA binding sites from PAR-CLIP short-read sequence data , 2011, Genome Biology.

[19]  Sunita Sarawagi,et al.  Discriminative Methods for Multi-labeled Classification , 2004, PAKDD.

[20]  Lionel Spinelli,et al.  Protein complex scaffolding predicted as a prevalent function of long non-coding RNAs , 2017, Nucleic acids research.

[21]  C. Ponting,et al.  Evolution and Functions of Long Noncoding RNAs , 2009, Cell.

[22]  Xiaohui S. Xie,et al.  DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences , 2015, bioRxiv.

[23]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[24]  Howard Y. Chang,et al.  Molecular mechanisms of long noncoding RNAs. , 2011, Molecular cell.

[25]  O. Khorkova,et al.  Basic biology and therapeutic implications of lncRNA. , 2015, Advanced drug delivery reviews.

[26]  Alessio Colantoni,et al.  Revealing protein–lncRNA interaction , 2015, Briefings Bioinform..

[27]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Xiao Sun,et al.  Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection , 2015, BioMed research international.

[29]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[30]  J. Harrow,et al.  GENCODE: producing a reference annotation for ENCODE , 2006, Genome Biology.