AC-Caps: Attention Based Capsule Network for Predicting RBP Binding Sites of LncRNA

Long non-coding RNA(lncRNA) is one of the non-coding RNAs longer than 200 nucleotides and it has no protein encoding function. LncRNA plays a key role in many biological processes. Studying the RNA-binding protein (RBP) binding sites on the lncRNA chain helps to reveal epigenetic and post-transcriptional mechanisms, to explore the physiological and pathological processes of cancer, and to discover new therapeutic breakthroughs. To improve the recognition rate of RBP binding sites and reduce the experimental time and cost, many calculation methods based on domain knowledge to predict RBP binding sites have emerged. However, these prediction methods are independent of nucleotides and do not take into account nucleotide statistics. In this paper, we use a high-order statistical-based encoding scheme, then the encoded lncRNA sequences are fed into a hybrid deep learning architecture named AC-Caps. It consists of a joint processing layer(composed of attention mechanism and convolutional neural network) and a capsule network. The AC-Caps model was evaluated using 31 independent experimental data sets from 12 lncRNA-binding proteins. In experiments, our method achieves excellent performance, with an average area under the curve (AUC) of 0.967 and an average accuracy (ACC) of 92.5%, which are 0.014, 2.3%, 0.261, 28.9%, 0.189, and 21.8% higher than HOCCNNLB, iDeepS, and DeepBind, respectively. The results show that the AC-Caps method can reliably process the large-scale RBP binding site data on the lncRNA chain, and the prediction performance is better than existing deep-learning models. The source code of AC-Caps and the datasets used in this paper are available at https://github.com/JinmiaoS/AC-Caps .

[1]  M. Moore,et al.  Post-transcriptional regulation of gene expression in innate immunity , 2014, Nature Reviews Immunology.

[2]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[3]  De-Shuang Huang,et al.  High-Order Convolutional Neural Network Architecture for Predicting DNA-Protein Binding Sites , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  Hong-Bin Shen,et al.  Predicting RNA‐protein binding sites and motifs through combining local and global deep convolutional neural networks , 2018, Bioinform..

[5]  R. Wu,et al.  Computational Prediction of RNA-Binding Proteins and Binding Sites , 2015, International journal of molecular sciences.

[6]  Yadong Wang,et al.  LncRNA2Target: a database for differentially expressed genes after lncRNA knockdown or overexpression , 2014, Nucleic Acids Res..

[7]  Páll Melsted,et al.  Efficient counting of k-mers in DNA sequences using a bloom filter , 2011, BMC Bioinformatics.

[8]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[9]  R. Agami,et al.  MicroRNA regulation by RNA-binding proteins and its implications for cancer , 2011, Nature Reviews Cancer.

[10]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[11]  Juan Xu,et al.  D-lnc: a comprehensive database and analytical platform to dissect the modification of drugs on lncRNA expression , 2019, RNA biology.

[12]  Fei Liu,et al.  Inference of Gene Regulatory Network Based on Local Bayesian Networks , 2016, PLoS Comput. Biol..

[13]  Jaeyoung Kim,et al.  Text Classification using Capsules , 2018, Neurocomputing.

[14]  Marinka Zitnik,et al.  Orthogonal matrix factorization enables integrative analysis of multiple RNA binding proteins , 2016, Bioinform..

[15]  Takumi Ichimura,et al.  Knowledge Extraction of Adaptive Structural Learning of Deep Belief Network for Medical Examination Data , 2019, Int. J. Semantic Comput..

[16]  Abdollah Dehzangi,et al.  PyFeat: a Python-based effective feature generation tool for DNA, RNA and protein sequences , 2019, Bioinform..

[17]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[18]  P. Avner,et al.  Quantitative predictions of protein interactions with long noncoding RNAs , 2016, Nature Methods.

[19]  Mohammad Mehdi Homayounpour,et al.  A Gender-Aware Deep Neural Network Structure for Speech Recognition , 2019 .

[20]  G. Carmichael,et al.  Decoding the function of nuclear long non-coding RNAs. , 2010, Current opinion in cell biology.

[21]  Xiaoli Zhang,et al.  RBPPred: predicting RNA‐binding proteins from sequence using SVM , 2016, Bioinform..

[22]  Yelong Shen,et al.  A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval , 2014, CIKM.

[23]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[24]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[25]  Hong-Bin Shen,et al.  RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach , 2016, BMC Bioinformatics.

[26]  R. Backofen,et al.  GraphProt: modeling binding preferences of RNA-binding proteins , 2014, Genome Biology.

[27]  Guobo Xie,et al.  LLCLPLDA: a novel model for predicting lncRNA–disease associations , 2019, Molecular Genetics and Genomics.

[28]  M. Rossi,et al.  LncRNAs: New Players in Apoptosis Control , 2014, International journal of cell biology.

[29]  S. Dimmeler,et al.  Long Noncoding RNA MALAT1 Regulates Endothelial Cell Function and Vessel Growth , 2014, Circulation Research.

[30]  Shaowu Zhang,et al.  Prediction of the RBP binding sites on lncRNAs using the high-order nucleotide encoding convolutional neural network. , 2019, Analytical biochemistry.

[31]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[32]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33]  Uwe Ohler,et al.  Deep neural networks for interpreting RNA-binding protein target preferences , 2019, bioRxiv.

[34]  Hong-Bin Shen,et al.  CRIP: predicting circRNA–RBP-binding sites using a codon-based encoding and hybrid deep neural networks , 2019, RNA.

[35]  Junchi Yan,et al.  Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks , 2017, BMC Genomics.

[36]  Carl Kingsford,et al.  A fast, lock-free approach for efficient parallel counting of occurrences of k-mers , 2011, Bioinform..

[37]  Yael Mandel-Gutfreund,et al.  BindUP: a web server for non-homology-based prediction of DNA and RNA binding proteins , 2016, Nucleic Acids Res..

[38]  Dongsup Kim,et al.  Prediction of binding property of RNA-binding proteins using multi-sized filters and multi-modal deep convolutional neural network , 2019, PloS one.

[39]  Junchi Yan,et al.  Attention based convolutional neural network for predicting RNA-protein binding sites , 2017, ArXiv.

[40]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[41]  Yu Yao,et al.  DeepMVF-RBP: Deep Multi-view Fusion Representation Learning for RNA-binding Proteins Prediction , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).