High precision in microRNA prediction: a novel genome-wide approach based on convolutional deep residual networks

Motivation MicroRNAs (miRNAs) are small non-coding RNAs that have a key role in the regulation of gene expression. The importance of miRNAs is widely acknowledged by the community nowadays, and the precise prediction of novel candidates with computational methods is still very needed. This could be done by searching homologous with sequence alignment tools, but this will be restricted only to sequences very similar to the known miRNA precursors (pre-miRNAs). Further-more, other important properties of pre-miRNAs, such as the secondary structure, are not taken into account by these methods. Many machine learning approaches were proposed in the last years to fill this gap, but these methods were tested in very controlled conditions, which are not fulfilled, for example, when predicting in newly sequenced genomes, where no miRNAs are known. If these methods are used under real conditions, the precision achieved is far from the one published. Results This work provides a novel approach for dealing with the computational prediction of pre-miRNAs: a convolutional deep residual neural network. The proposed model has been tested on several complete genomes of animals and plants, achieving a precision up to 5 times higher than other approaches at the same recall rates. Also, a novel validation methodology is used to ensure that the performance reported can be achieved when using the method on new unknown species. Availability To provide fast an easy access to mirDNN, a web demo is available here. It can process fasta files with multiple sequences to calculate the prediction scores, and can generate the nucleotide importance plots. The full source code of this project is available here and here. Contact cyones@sinc.unl.edu.ar

[1]  Marek Sikora,et al.  HuntMi: an efficient and taxon-specific approach in pre-miRNA identification , 2013, BMC Bioinformatics.

[2]  H. Hwu,et al.  MicroRNA Expression Aberration as Potential Peripheral Blood Biomarkers for Schizophrenia , 2011, PloS one.

[3]  Milton Pividori,et al.  A very simple and fast way to access and validate algorithms in reproducible research , 2016, Briefings Bioinform..

[4]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[5]  Roberto Grossi,et al.  Circular sequence comparison: algorithms and applications , 2016, Algorithms for Molecular Biology.

[6]  Alexander Schliep,et al.  The discriminant power of RNA features for pre-miRNA recognition , 2013, BMC Bioinformatics.

[7]  Noorul Amin,et al.  Evaluation of deep learning in non-coding RNA classification , 2019, Nature Machine Intelligence.

[8]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[9]  Yusuke Yamamoto,et al.  Loss of microRNA-27b contributes to breast cancer stem cell generation by activating ENPP1 , 2015, Nature Communications.

[10]  Andrew D. Johnson,et al.  Genome-wide Identification of microRNA Expression Quantitative Trait Loci , 2015, Nature Communications.

[11]  Takaya Saito,et al.  The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets , 2015, PloS one.

[12]  David K. Gifford,et al.  Convolutional neural network architectures for predicting DNA–protein binding , 2016, Bioinform..

[13]  Li Li,et al.  Computational approaches for microRNA studies: a review , 2010, Mammalian Genome.

[14]  T. Ohshima,et al.  Stimulated emission from nitrogen-vacancy centres in diamond , 2016, Nature Communications.

[15]  G. Gibson Going to the negative: genomics for optimized medical prescription , 2018, Nature Reviews Genetics.

[16]  Georgina Stegmayer,et al.  Genome-wide discovery of pre-miRNAs: comparison of recent approaches based on machine learning , 2020, Briefings Bioinform..

[17]  John N. Weinstein,et al.  ElemCor: accurate data analysis and enrichment calculation for high-resolution LC-MS stable isotope labeling experiments , 2019, BMC Bioinformatics.

[18]  Georgina Stegmayer,et al.  Deep Neural Architectures for Highly Imbalanced Data in Bioinformatics , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[19]  Georgina Stegmayer,et al.  Complexity measures of the mature miRNA for improving pre-miRNAs prediction , 2019, Bioinform..

[20]  Georgina Stegmayer,et al.  Genome-wide pre-miRNA discovery from few labeled examples , 2018, Bioinform..

[21]  Seokjun Seo,et al.  DeepFam: deep learning based alignment-free method for protein family modeling and prediction , 2018, Bioinform..

[22]  Jan Baumbach,et al.  On the performance of pre-microRNA detection algorithms , 2017, Nature Communications.

[23]  Yanni Sun,et al.  Fast and accurate microRNA search using CNN , 2019, BMC Bioinformatics.

[24]  Fabian J Theis,et al.  Deep learning: new computational modelling techniques for genomics , 2019, Nature Reviews Genetics.

[25]  Ana Kozomara,et al.  miRBase: from microRNA sequences to function , 2018, Nucleic Acids Res..

[26]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[27]  D. Searls,et al.  Robots in invertebrate neuroscience , 2002, Nature.

[28]  L. Hood,et al.  A Review of Computational Tools in microRNA Discovery , 2013, Front. Genet..

[29]  Milton Pividori,et al.  Predicting novel microRNA: a comprehensive comparison of machine learning approaches , 2019, Briefings Bioinform..

[30]  Vaibhav Shukla,et al.  A compilation of Web-based research tools for miRNA analysis , 2017, Briefings in functional genomics.

[31]  Sebastián M. Real,et al.  E2F1 Regulates Cellular Growth by mTORC1 Signaling , 2011, PloS one.

[32]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.

[34]  Peter F. Stadler,et al.  RNA folding with hard and soft constraints , 2016, Algorithms for Molecular Biology.

[35]  Christopher J. Cheng,et al.  MicroRNA silencing for cancer therapy targeted to the tumor microenvironment , 2014, Nature.

[36]  Xueming Zheng,et al.  Nucleotide-level Convolutional Neural Networks for Pre-miRNA Classification , 2019, Scientific Reports.

[37]  Yang Yang,et al.  Trends in the development of miRNA bioinformatics tools , 2019, Briefings Bioinform..

[38]  Georgina Stegmayer,et al.  miRNAfe: A comprehensive tool for feature extraction in microRNA prediction , 2015, Biosyst..

[39]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  M. Scharf,et al.  Rapid evolutionary responses to insecticide resistance management interventions by the German cockroach (Blattella germanica L.) , 2019, Scientific Reports.