Nucleotide-level Convolutional Neural Networks for Pre-miRNA Classification

Due to the biogenesis difference, miRNAs can be divided into canonical microRNAs and mirtrons. Compared to canonical microRNAs, mirtrons are less conserved and hard to be identified. Except stringent annotations based on experiments, many in silico computational methods have be developed to classify miRNAs. Although several machine learning classifiers delivered high classification performance, all the predictors depended heavily on the selection of calculated features. Here, we introduced nucleotide-level convolutional neural networks (CNNs) for pre-miRNAs classification. By using “one-hot” encoding and padding, pre-miRNAs were converted into matrixes with the same shape. The convolution and max-pooling operations can automatically extract features from pre-miRNAs sequences. Evaluation on test dataset showed that our models had a satisfactory performance. Our investigation showed that it was feasible to apply CNNs to extract features from biological sequences. Since there are many hyperparameters can be tuned in CNNs, we believe that the performance of nucleotide-level convolutional neural networks can be greatly improved in the future.

[1]  Eric C Lai,et al.  Mirtrons: microRNA biogenesis via splicing. , 2011, Biochimie.

[2]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[3]  Donna R. Maglott,et al.  RefSeq and LocusLink: NCBI gene-centered resources , 2001, Nucleic Acids Res..

[4]  Byoung-Tak Zhang,et al.  Molecular Basis for the Recognition of Primary microRNAs by the Drosha-DGCR8 Complex , 2006, Cell.

[5]  J. Meléndez-Zajgla,et al.  New emerging roles of microRNAs in breast cancer , 2018, Breast Cancer Research and Treatment.

[6]  Eugene Berezikov,et al.  Mammalian mirtron genes. , 2007, Molecular cell.

[7]  Hyeshik Chang,et al.  Dicer recognizes the 5′ end of RNA for efficient and accurate processing , 2011, Nature.

[8]  Raimundo Santos Moura,et al.  An analysis of convolutional neural networks for sentence classification , 2017, 2017 XLIII Latin American Computer Conference (CLEI).

[9]  Jan Baumbach,et al.  On the performance of pre-microRNA detection algorithms , 2017, Nature Communications.

[10]  U. Kutay,et al.  Nuclear Export of MicroRNA Precursors , 2004, Science.

[11]  Olgierd Unold,et al.  Distinguishing mirtrons from canonical miRNAs with data exploration and machine learning methods , 2018, Scientific Reports.

[12]  Peng Jiang,et al.  MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features , 2007, Nucleic Acids Res..

[13]  D. Bartel,et al.  Intronic microRNA precursors that bypass Drosha processing , 2007, Nature.

[14]  Jiayu Wen,et al.  Analysis of Nearly One Thousand Mammalian Mirtrons Reveals Novel Features of Dicer Substrates , 2015, PLoS Comput. Biol..

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  Yuhui Xu,et al.  Filter Level Pruning Based on Similar Feature Extraction for Convolutional Neural Networks , 2018, IEICE Trans. Inf. Syst..

[17]  S. Modi,et al.  Regulation of angiogenesis by microRNAs in cardiovascular diseases , 2018, Angiogenesis.

[18]  Sam Griffiths-Jones,et al.  miRBase: the microRNA sequence database. , 2006, Methods in molecular biology.

[19]  Santosh K. Mishra,et al.  De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures , 2007, Bioinform..

[20]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[21]  N. Rao,et al.  The role of miRNA in inflammation and autoimmunity. , 2013, Autoimmunity reviews.

[22]  Xiaodong Wang,et al.  Argonaute2 Cleaves the Anti-Guide Strand of siRNA during RISC Activation , 2005, Cell.

[23]  E. Hovig,et al.  A Uniform System for the Annotation of Vertebrate microRNA Genes and the Evolution of the Human microRNAome. , 2015, Annual review of genetics.

[24]  Ivo L. Hofacker,et al.  Vienna RNA secondary structure server , 2003, Nucleic Acids Res..

[25]  Jian-Qiang Wang,et al.  CROSS-ENTROPY MEASURES OF MULTIVALUED NEUTROSOPHIC SETS AND ITS APPLICATION IN SELECTING MIDDLE-LEVEL MANAGER , 2017 .