Mirnovo: genome-free prediction of microRNAs from small RNA sequencing data and single-cells using decision forests

Abstract The discovery of microRNAs (miRNAs) remains an important problem, particularly given the growth of high-throughput sequencing, cell sorting and single cell biology. While a large number of miRNAs have already been annotated, there may well be large numbers of miRNAs that are expressed in very particular cell types and remain elusive. Sequencing allows us to quickly and accurately identify the expression of known miRNAs from small RNA-Seq data. The biogenesis of miRNAs leads to very specific characteristics observed in their sequences. In brief, miRNAs usually have a well-defined 5′ end and a more flexible 3′ end with the possibility of 3′ tailing events, such as uridylation. Previous approaches to the prediction of novel miRNAs usually involve the analysis of structural features of miRNA precursor hairpin sequences obtained from genome sequence. We surmised that it may be possible to identify miRNAs by using these biogenesis features observed directly from sequenced reads, solely or in addition to structural analysis from genome data. To this end, we have developed mirnovo, a machine learning based algorithm, which is able to identify known and novel miRNAs in animals and plants directly from small RNA-Seq data, with or without a reference genome. This method performs comparably to existing tools, however is simpler to use with reduced run time. Its performance and accuracy has been tested on multiple datasets, including species with poorly assembled genomes, RNaseIII (Drosha and/or Dicer) deficient samples and single cells (at both embryonic and adult stage).

[1]  David W. Taylor,et al.  A Novel miRNA Processing Pathway Independent of Dicer Requires Argonaute2 Catalytic Activity , 2010, Science.

[2]  V. Kim,et al.  Re-evaluation of the roles of DROSHA, Exportin 5, and DICER in microRNA biogenesis , 2016, Proceedings of the National Academy of Sciences.

[3]  Anton J. Enright,et al.  Kraken: A set of tools for quality control and analysis of high-throughput sequence data , 2013, Methods.

[4]  Sebastian D. Mackowiak,et al.  miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades , 2011, Nucleic acids research.

[5]  Lei Li,et al.  miRDeep-P: a computational tool for analyzing the microRNA transcriptome in plants , 2011, Bioinform..

[6]  Yoshiki Murakami,et al.  Comparison of Hepatocellular Carcinoma miRNA Expression Profiling as Evaluated by Next Generation Sequencing and Microarray , 2014, PloS one.

[7]  Alessandra Carbone,et al.  MIReNA: finding microRNAs with high accuracy and no learning at genome scale and from deep sequencing data , 2010, Bioinform..

[8]  John C. Wootton,et al.  Statistics of Local Complexity in Amino Acid Sequences and Sequence Databases , 1993, Comput. Chem..

[9]  G. Hannon,et al.  A dicer-independent miRNA biogenesis pathway that requires Ago catalysis , 2010, Nature.

[10]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[11]  R. Shankar,et al.  miReader: Discovering Novel miRNAs in Species without Sequenced Genome , 2013, PloS one.

[12]  David I. K. Martin,et al.  Deep Sequencing Reveals Novel MicroRNAs and Regulation of MicroRNA Expression during Cell Senescence , 2011, PloS one.

[13]  Robert D. Finn,et al.  Rfam 12.0: updates to the RNA families database , 2014, Nucleic Acids Res..

[14]  Peter F. Stadler,et al.  ViennaRNA Package 2.0 , 2011, Algorithms for Molecular Biology.

[15]  Xavier Estivill,et al.  Evidence for the biogenesis of more than 1,000 novel human microRNAs , 2014, Genome Biology.

[16]  V. Kim,et al.  Regulation of microRNA biogenesis , 2014, Nature Reviews Molecular Cell Biology.

[17]  Neha S. Mahajan,et al.  Identification and expression profiling of Helicoverpa armigera microRNAs and their possible role in the regulation of digestive protease genes. , 2014, Insect biochemistry and molecular biology.

[18]  Pedro G. Ferreira,et al.  Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.

[19]  Ana M. Aransay,et al.  miRanalyzer: an update on the detection and analysis of microRNAs in high-throughput sequencing experiments , 2011, Nucleic Acids Res..

[20]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[21]  Anton J. Enright,et al.  Chimira: analysis of small RNA sequencing data and microRNA modifications , 2015, Bioinform..

[22]  N. Johnson Making sense of the human genome , 2014 .

[23]  Rickard Sandberg,et al.  Single-cell sequencing of the small-RNA transcriptome , 2016, Nature Biotechnology.

[24]  Ben Nichols,et al.  Distributed under Creative Commons Cc-by 4.0 Vsearch: a Versatile Open Source Tool for Metagenomics , 2022 .

[25]  M. Levine,et al.  miRTRAP, a computational method for the systematic identification of miRNAs from high throughput sequencing data , 2010, Genome Biology.

[26]  Stijn van Dongen,et al.  miRBase: tools for microRNA genomics , 2007, Nucleic Acids Res..

[27]  P. Holland,et al.  A Burst of miRNA Innovation in the Early Evolution of Butterflies and Moths , 2015, Molecular biology and evolution.

[28]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[29]  Qinghua Shi,et al.  mirTools 2.0 for non-coding RNA discovery, profiling, and functional annotation based on high-throughput sequencing , 2013, RNA biology.

[30]  D. Bartel,et al.  Intronic microRNA precursors that bypass Drosha processing , 2007, Nature.

[31]  V. Moulton,et al.  MirPlex: a tool for identifying miRNAs in high-throughput sRNA datasets without a genome. , 2013, Journal of experimental zoology. Part B, Molecular and developmental evolution.

[32]  Xuemei Chen,et al.  microRNA biogenesis and function in plants , 2005, FEBS letters.

[33]  Ying Cheng,et al.  The European Nucleotide Archive , 2010, Nucleic Acids Res..

[34]  J. Nichols,et al.  Naive and primed pluripotent states. , 2009, Cell stem cell.