CL-PMI: A Precursor MicroRNA Identification Method Based on Convolutional and Long Short-Term Memory Networks

MicroRNAs (miRNAs) are the major class of gene-regulating molecules that bind mRNAs. They function mainly as translational repressors in mammals. Therefore, how to identify miRNAs is one of the most important problems in medical treatment. Many known pre-miRNAs have a hairpin ring structure containing more structural features, and it is difficult to identify mature miRNAs because of their short length. Therefore, most research focuses on the identification of pre-miRNAs. Most computational models rely on manual feature extraction to identify pre-miRNAs and do not consider the sequential and spatial characteristics of pre-miRNAs, resulting in a loss of information. As the number of unidentified pre-miRNAs is far greater than that of known pre-miRNAs, there is a dataset imbalance problem, which leads to a degradation of the performance of pre-miRNA identification methods. In order to overcome the limitations of existing methods, we propose a pre-miRNA identification algorithm based on a cascaded CNN-LSTM framework, called CL-PMI. We used a convolutional neural network to automatically extract features and obtain pre-miRNA spatial information. We also employed long short-term memory (LSTM) to capture time characteristics of pre-miRNAs and improve attention mechanisms for long-term dependence modeling. Focal loss was used to improve the dataset imbalance. Compared with existing methods, CL-PMI achieved better performance on all datasets. The results demonstrate that this method can effectively identify pre-miRNAs by simultaneously considering their spatial and sequential information, as well as dealing with imbalance in the datasets.

[1]  Fei Li,et al.  Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine , 2005, BMC Bioinformatics.

[2]  Wei Xu,et al.  CNN-RNN: A Unified Framework for Multi-label Image Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[4]  Peng Jiang,et al.  MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features , 2007, Nucleic Acids Res..

[5]  R. Aharonov,et al.  Identification of hundreds of conserved and nonconserved human microRNAs , 2005, Nature Genetics.

[6]  R. Ji,et al.  Improved and Promising Identification of Human MicroRNAs by Incorporating a High-Quality Negative Set , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  D. Bartel,et al.  MicroRNAS and their regulatory roles in plants. , 2006, Annual review of plant biology.

[8]  Ali M. Ardekani,et al.  The Role of MicroRNAs in Human Diseases , 2010, Avicenna journal of medical biotechnology.

[9]  R. Islam,et al.  MiRANN: a reliable approach for improved classification of precursor microRNA using Artificial Neural Network model. , 2012, Genomics.

[10]  Jürgen Schmidhuber,et al.  Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition , 2005, ICANN.

[11]  Vasile Palade,et al.  microPred: effective classification of pre-miRNAs for human miRNA gene prediction , 2009, Bioinform..

[12]  R. Meuwissen,et al.  The role of microRNAs in human diseases. , 2014, Methods in molecular biology.

[13]  Junchi Yan,et al.  Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks , 2017, BMC Genomics.

[14]  Hui Xiao,et al.  NONCODE v3.0: integrative annotation of long noncoding RNAs , 2011, Nucleic Acids Res..

[15]  Vasile Palade,et al.  A New Performance Measure for Class Imbalance Learning. Application to Bioinformatics Problems , 2009, 2009 International Conference on Machine Learning and Applications.

[16]  Anton J. Enright,et al.  Identification of Virus-Encoded MicroRNAs , 2004, Science.

[17]  Alessandra Carbone,et al.  MIReNA: finding microRNAs with high accuracy and no learning at genome scale and from deep sequencing data , 2010, Bioinform..

[18]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[19]  Laurent Lestrade,et al.  snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs , 2005, Nucleic Acids Res..

[20]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[21]  T. Blondal,et al.  Efficient identification of miRNAs for classification of tumor origin. , 2014, The Journal of molecular diagnostics : JMD.

[22]  Ashwin Srinivasan,et al.  Prediction of novel precursor miRNAs using a context-sensitive hidden Markov model (CSHMM) , 2010, BMC Bioinformatics.

[23]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[24]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Ronan Collobert,et al.  Recurrent Convolutional Neural Networks for Scene Labeling , 2014, ICML.

[26]  Lee Sael,et al.  DP-miRNA: An improved prediction of precursor microRNA using deep learning model , 2017, 2017 IEEE International Conference on Big Data and Smart Computing (BigComp).

[27]  Byunghan Lee,et al.  Advance Access Publication Date: Day Month Year Manuscript Category Deeptarget: End-to-end Learning Framework for Microrna Target Prediction Using Deep Recurrent Neural Networks , 2022 .

[28]  DarrellTrevor,et al.  Long-Term Recurrent Convolutional Networks for Visual Recognition and Description , 2017 .

[29]  Marco F. Schmidt,et al.  Drug target miRNAs: chances and challenges. , 2014, Trends in biotechnology.

[30]  Stijn van Dongen,et al.  miRBase: microRNA sequences, targets and gene nomenclature , 2005, Nucleic Acids Res..

[31]  Seunghyun Park,et al.  Deep Recurrent Neural Network-Based Identification of Precursor microRNAs , 2017, NIPS.

[32]  Wei Wu MicroRNA and Cancer , 2011, Methods in Molecular Biology.

[33]  Jinhua Sun,et al.  Different miRNA expression profiles between human breast cancer tumors and serum , 2014, Front. Genet..

[34]  Takaya Saito,et al.  The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets , 2015, PloS one.

[35]  Gang Wang,et al.  Convolutional recurrent neural networks: Learning spatial dependencies for image representation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[36]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.

[37]  Vikram Mullachery,et al.  Image Captioning , 2018, ArXiv.

[38]  Dong Wang,et al.  Human MicroRNA Oncogenes and Tumor Suppressors Show Significantly Different Biological Patterns: From Functions to Targets , 2010, PloS one.

[39]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Jiebo Luo,et al.  Image Captioning with Semantic Attention , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Aya Kojima,et al.  fRNAdb: a platform for mining/annotating functional RNA candidates from non-coding RNA sequences , 2006, Nucleic Acids Res..

[43]  Arfan Ali,et al.  Prediction of Host-Derived miRNAs with the Potential to Target PVY in Potato Plants , 2016, Front. Genet..

[44]  C. Croce,et al.  Human microRNA genes are frequently located at fragile sites and genomic regions involved in cancers. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[45]  Daniel Cremers,et al.  Precursor microRNA Identification Using Deep Convolutional Neural Networks , 2018, bioRxiv.

[46]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[47]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[48]  Ivo L. Hofacker,et al.  Vienna RNA secondary structure server , 2003, Nucleic Acids Res..

[49]  Xiaohui S. Xie,et al.  DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences , 2015, bioRxiv.

[50]  Fariza Tahi,et al.  miRBoost: boosting support vector machines for microRNA precursor classification , 2015, RNA.

[51]  Limin Jiang,et al.  BP Neural Network Could Help Improve Pre-miRNA Identification in Various Species , 2016, BioMed research international.