High precision in microRNA prediction: A novel genome-wide approach with convolutional deep residual networks

MicroRNAs (miRNAs) are small non-coding RNAs that have a key role in the regulation of gene expression. The importance of miRNAs is widely acknowledged by the community nowadays and computational methods are needed for the precise prediction of novel candidates to miRNA. This task can be done by searching homologous with sequence alignment tools, but results are restricted to sequences that are very similar to the known miRNA precursors (pre-miRNAs). Besides, a very important property of pre-miRNAs, their secondary structure, is not taken into account by these methods. To fill this gap, many machine learning approaches were proposed in the last years. However, the methods are generally tested in very controlled conditions. If these methods were used under real conditions, the false positives increase and the precisions fall quite below those published. This work provides a novel approach for dealing with the computational prediction of pre-miRNAs: a convolutional deep residual neural network (mirDNN). This model was tested with several genomes of animals and plants, the full-genomes, achieving a precision up to 5 times larger than other approaches at the same recall rates. Furthermore, a novel validation methodology was used to ensure that the performance reported in this study can be effectively achieved when using mirDNN in novel species. To provide fast an easy access to mirDNN, a web demo is available at http://sinc.unl.edu.ar/web-demo/mirdnn/. The demo can process FASTA files with multiple sequences to calculate the prediction scores and generates the nucleotide importance plots. FULL SOURCE CODE: http://sourceforge.net/projects/sourcesinc/files/mirdnn and https://github.com/cyones/mirDNN. CONTACT: gstegmayer@sinc.unl.edu.ar.

[1]  Georgina Stegmayer,et al.  Complexity measures of the mature miRNA for improving pre-miRNAs prediction , 2019, Bioinform..

[2]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Diego H. Milone,et al.  Genome-wide hairpins datasets of animals and plants for novel miRNA prediction , 2019, Data in brief.

[4]  Peter F. Stadler,et al.  RNA folding with hard and soft constraints , 2016, Algorithms for Molecular Biology.

[5]  Ujjwal Maulik,et al.  Machine learning integrated ensemble of feature selection methods followed by survival analysis for predicting breast cancer subtype specific miRNA biomarkers , 2021, Comput. Biol. Medicine.

[6]  Fabian J Theis,et al.  Deep learning: new computational modelling techniques for genomics , 2019, Nature Reviews Genetics.

[7]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[8]  Xueming Zheng,et al.  Nucleotide-level Convolutional Neural Networks for Pre-miRNA Classification , 2019, Scientific Reports.

[9]  Yanni Sun,et al.  Fast and accurate microRNA search using CNN , 2019, BMC Bioinformatics.

[10]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Yang Yang,et al.  Trends in the development of miRNA bioinformatics tools , 2019, Briefings Bioinform..

[12]  Georgina Stegmayer,et al.  Deep Neural Architectures for Highly Imbalanced Data in Bioinformatics , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Li Li,et al.  Computational approaches for microRNA studies: a review , 2010, Mammalian Genome.

[14]  Milton Pividori,et al.  A very simple and fast way to access and validate algorithms in reproducible research , 2016, Briefings Bioinform..

[15]  David K. Gifford,et al.  Convolutional neural network architectures for predicting DNA–protein binding , 2016, Bioinform..

[16]  Ana Kozomara,et al.  miRBase: from microRNA sequences to function , 2018, Nucleic Acids Res..

[17]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.

[18]  Georgina Stegmayer,et al.  miRNAfe: A comprehensive tool for feature extraction in microRNA prediction , 2015, Biosyst..

[19]  Noorul Amin,et al.  Evaluation of deep learning in non-coding RNA classification , 2019, Nature Machine Intelligence.

[20]  L. Hood,et al.  A Review of Computational Tools in microRNA Discovery , 2013, Front. Genet..

[21]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[22]  Marek Sikora,et al.  HuntMi: an efficient and taxon-specific approach in pre-miRNA identification , 2013, BMC Bioinformatics.

[23]  Seokjun Seo,et al.  DeepFam: deep learning based alignment-free method for protein family modeling and prediction , 2018, Bioinform..

[24]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[25]  Jan Baumbach,et al.  On the performance of pre-microRNA detection algorithms , 2017, Nature Communications.

[26]  Andrew D. Johnson,et al.  Genome-wide Identification of microRNA Expression Quantitative Trait Loci , 2015, Nature Communications.

[27]  Takaya Saito,et al.  The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets , 2015, PloS one.

[28]  Georgina Stegmayer,et al.  Genome-wide discovery of pre-miRNAs: comparison of recent approaches based on machine learning , 2020, Briefings Bioinform..

[29]  Abhijit Sarma,et al.  An in-silico approach to study the possible interactions of miRNA between human and SARS-CoV2 , 2020, Computational Biology and Chemistry.

[30]  Alexander Schliep,et al.  The discriminant power of RNA features for pre-miRNA recognition , 2013, BMC Bioinformatics.

[31]  Milton Pividori,et al.  Predicting novel microRNA: a comprehensive comparison of machine learning approaches , 2019, Briefings Bioinform..

[32]  Vaibhav Shukla,et al.  A compilation of Web-based research tools for miRNA analysis , 2017, Briefings in functional genomics.

[33]  Yusuke Yamamoto,et al.  Loss of microRNA-27b contributes to breast cancer stem cell generation by activating ENPP1 , 2015, Nature Communications.

[34]  D. Searls,et al.  Robots in invertebrate neuroscience , 2002, Nature.

[35]  Seyed Hamid Aghaee-Bakhtiari,et al.  Web-based tools for miRNA studies analysis , 2020, Comput. Biol. Medicine.

[36]  Georgina Stegmayer,et al.  Genome-wide pre-miRNA discovery from few labeled examples , 2018, Bioinform..