Precursor microRNA Identification Using Deep Convolutional Neural Networks

Precursor microRNA (pre-miRNA) identification is the basis for identifying microRNAs (miRNAs), which have important roles in post-transcriptional regulation of gene expression. In this paper, we propose a deep learning method to identify whether a small non-coding RNA sequence is a pre-miRNA or not. We outperform state-of-the-art methods on three benchmark datasets, namely the human, cross-species, and new datasets. The key of our method is to use a matrix representation of predicted secondary structure as input to a 2D convolutional network. The neural network extracts optimized features automatically instead of using a large number of handcrafted features as most existing methods do. Code and results are available at https://github.com/peace195/miRNA-identification-conv2D.

[1]  Fariza Tahi,et al.  miRBoost: boosting support vector machines for microRNA precursor classification , 2015, RNA.

[2]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[3]  Ivo L. Hofacker,et al.  Vienna RNA secondary structure server , 2003, Nucleic Acids Res..

[4]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[7]  Jan Baumbach,et al.  On the performance of pre-microRNA detection algorithms , 2017, Nature Communications.

[8]  Seunghyun Park,et al.  Deep Recurrent Neural Network-Based Identification of Precursor microRNAs , 2017, NIPS.

[9]  Wei Wu MicroRNA and Cancer , 2011, Methods in Molecular Biology.

[10]  Marcin J. Skwark,et al.  Protein contact prediction from amino acid co-evolution using convolutional networks for graph-valued images , 2016, NIPS.

[11]  D. Bartel,et al.  MicroRNAS and their regulatory roles in plants. , 2006, Annual review of plant biology.

[12]  Stijn van Dongen,et al.  miRBase: microRNA sequences, targets and gene nomenclature , 2005, Nucleic Acids Res..

[13]  Anton J. Enright,et al.  Identification of Virus-Encoded MicroRNAs , 2004, Science.

[14]  Marcin J. Skwark,et al.  3D Deep Learning for Biological Function Prediction from Physical Fields , 2017, 2020 International Conference on 3D Vision (3DV).

[15]  Martin J. Wainwright,et al.  Randomized smoothing for (parallel) stochastic optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[16]  R. Aharonov,et al.  Identification of hundreds of conserved and nonconserved human microRNAs , 2005, Nature Genetics.

[17]  David A. Bader,et al.  GTfold: a scalable multicore code for RNA secondary structure prediction , 2009, SAC '09.

[18]  Vasile Palade,et al.  microPred: effective classification of pre-miRNAs for human miRNA gene prediction , 2009, Bioinform..

[19]  Alexander Churkin,et al.  RNA dot plots: an image representation for RNA secondary structure analysis and manipulations , 2013, Wiley interdisciplinary reviews. RNA.

[20]  Ashutosh Kumar Singh,et al.  Machine Learning Techniques in Exploring MicroRNA Gene Discovery, Targets, and Functions. , 2017, Methods in molecular biology.

[21]  J. Maizel,et al.  Enhanced graphic matrix analysis of nucleic acid and protein sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Dong Wang,et al.  Human MicroRNA Oncogenes and Tumor Suppressors Show Significantly Different Biological Patterns: From Functions to Targets , 2010, PloS one.

[23]  Martin J. Wainwright,et al.  Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions , 2011, ICML.

[24]  Ali M. Ardekani,et al.  The Role of MicroRNAs in Human Diseases , 2010, Avicenna journal of medical biotechnology.

[25]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[26]  R. Islam,et al.  MiRANN: a reliable approach for improved classification of precursor microRNA using Artificial Neural Network model. , 2012, Genomics.

[27]  Jens Meiler,et al.  Improving quantitative structure–activity relationship models using Artificial Neural Networks trained with dropout , 2016, Journal of Computer-Aided Molecular Design.

[28]  W. Fitch Locating gaps in amino acid sequences to optimize the homology between two proteins , 1969, Biochemical Genetics.

[29]  Eckart Bindewald,et al.  CyloFold: secondary structure prediction including pseudoknots , 2010, Nucleic Acids Res..

[30]  Limin Jiang,et al.  BP Neural Network Could Help Improve Pre-miRNA Identification in Various Species , 2016, BioMed research international.

[31]  Mohammed AlQuraishi,et al.  End-to-end differentiable learning of protein structure , 2018, bioRxiv.

[32]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33]  Aya Kojima,et al.  fRNAdb: a platform for mining/annotating functional RNA candidates from non-coding RNA sequences , 2006, Nucleic Acids Res..

[34]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[35]  Hui Xiao,et al.  NONCODE v3.0: integrative annotation of long noncoding RNAs , 2011, Nucleic Acids Res..

[36]  Alessandra Carbone,et al.  MIReNA: finding microRNAs with high accuracy and no learning at genome scale and from deep sequencing data , 2010, Bioinform..

[37]  Santosh K. Mishra,et al.  De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures , 2007, Bioinform..

[38]  Ashwin Srinivasan,et al.  Prediction of novel precursor miRNAs using a context-sensitive hidden Markov model (CSHMM) , 2010, BMC Bioinformatics.

[39]  Marek Sikora,et al.  HuntMi: an efficient and taxon-specific approach in pre-miRNA identification , 2013, BMC Bioinformatics.

[40]  Lee Sael,et al.  DP-miRNA: An improved prediction of precursor microRNA using deep learning model , 2017, 2017 IEEE International Conference on Big Data and Smart Computing (BigComp).

[41]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[42]  Laurent Lestrade,et al.  snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs , 2005, Nucleic Acids Res..

[43]  Jianlin Cheng,et al.  A Deep Learning Network Approach to ab initio Protein Secondary Structure Prediction , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[44]  R Staden,et al.  An interactive graphics program for comparing and aligning nucleic acid and amino acid sequences. , 1982, Nucleic acids research.

[45]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.

[46]  Fei Li,et al.  Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine , 2005, BMC Bioinformatics.

[47]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Peng Jiang,et al.  MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features , 2007, Nucleic Acids Res..

[49]  Daniel Cremers,et al.  Regularization for Deep Learning: A Taxonomy , 2017, ArXiv.