Unification of miRNA and isomiR research: the mirGFF3 format and the mirtop API

MOTIVATION MicroRNAs (miRNAs) are small RNA molecules (∼22 nucleotide long) involved in post-transcriptional gene regulation. Advances in high-throughput sequencing technologies led to the discovery of isomiRs, which are miRNA sequence variants. While many miRNA-seq analysis tools exist, the diversity of output formats hinders accurate comparisons between tools and precludes data sharing and the development of common downstream analysis methods. RESULTS To overcome this situation, we present here a community-based project, miRTOP (miRNA Transcriptomic Open Project) working towards the optimization of miRNA analyses. The aim of miRTOP is to promote the development of downstream isomiR analysis tools that are compatible with existing detection and quantification tools. Based on the existing GFF3 format, we first created a new standard format, mirGFF3, for the output of miRNA/isomiR detection and quantification results from small RNA-seq data. Additionally, we developed a command line Python tool, mirtop, to create and manage the mirGFF3 format. Currently, mirtop can convert into mirGFF3 the outputs of commonly used pipelines, such as seqbuster, isomiR-SEA, sRNAbench, Prost! as well as BAM files. Some tools have also incorporated the mirGFF3 format directly into their code, such as, miRge2.0, IsoMIRmap, and OptimiR. Its open architecture enables any tool or pipeline to output or convert results into mirGFF3. Collectively, this isomiR categorization system, along with the accompanying mirGFF3 and mirtop API, provide a comprehensive solution for the standardization of miRNA and isomiR annotation, enabling data sharing, reporting, comparative analyses, and benchmarking, while promoting the development of common miRNA methods focusing on downstream steps of miRNA detection, annotation, and quantification. AVAILABILITY https://github.com/miRTop/mirGFF3/ and https://github.com/miRTop/mirtop. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration , 2012, Briefings Bioinform..

[2]  Bo Xu,et al.  IsomiR Bank: a research resource for tracking IsomiRs , 2016, Bioinform..

[3]  Hua Zhao,et al.  A 5-MicroRNA Signature Identified from Serum MicroRNA Profiling Predicts Survival in Patients with Advanced Stage Non-Small Cell Lung Cancer. , 2018, Carcinogenesis.

[4]  J. Bonfield,et al.  Finishing the euchromatic sequence of the human genome , 2004, Nature.

[5]  Kimihiro Hino,et al.  A-to-I editing in the miRNA seed region regulates target mRNA selection and silencing efficiency , 2014, Nucleic acids research.

[6]  Scott Cain,et al.  GMODWeb: a web framework for the generic model organism database , 2008, Genome Biology.

[7]  Kevin Chen,et al.  QuagmiR: a cloud-based application for isomiR big data analytics , 2018, Bioinform..

[8]  Obi L. Griffith,et al.  ORegAnno 3.0: a community-driven resource for curated regulatory annotation , 2015, Nucleic Acids Res..

[9]  A. Quinlan BEDTools: The Swiss‐Army Tool for Genome Feature Analysis , 2014, Current protocols in bioinformatics.

[10]  Brent S. Pedersen,et al.  Pybedtools: a flexible Python library for manipulating genomic datasets and annotations , 2011, Bioinform..

[11]  V. Ambros,et al.  The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14 , 1993, Cell.

[12]  Renee Rubio,et al.  Comprehensive multi-center assessment of accuracy, reproducibility and bias of small RNA-seq methods for quantitative miRNA profiling , 2018 .

[13]  Yi Jing,et al.  Beyond the one-locus-one-miRNA paradigm: microRNA isoforms enable deeper insights into breast cancer heterogeneity , 2015, Nucleic acids research.

[14]  Massimiliano Izzo,et al.  FAIRsharing as a community approach to standards, repositories and policies , 2019, Nature Biotechnology.

[15]  K Eilbeck,et al.  miRNA Nomenclature: A View Incorporating Genetic Origins, Biosynthetic Pathways, and Sequence Variants. , 2015, Trends in genetics : TIG.

[16]  Ali M. Ardekani,et al.  The Role of MicroRNAs in Human Diseases , 2010, Avicenna journal of medical biotechnology.

[17]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[18]  Phillipe Loher,et al.  IsomiR expression profiles in human lymphoblastoid cell lines exhibit population and gender dependencies , 2014, Oncotarget.

[19]  Carlos Luzzani,et al.  Identification of the miRNAome of early mesoderm progenitor cells and cardiomyocytes derived from human pluripotent stem cells , 2018, Scientific Reports.

[20]  D. Greco,et al.  Strong conservation of inbred mouse strain microRNA loci but broad variation in brain microRNAs due to RNA editing and isomiR expression , 2018, RNA.

[21]  J. Oliver,et al.  sRNAbench and sRNAtoolbox 2019: intuitive fast small RNA profiling and differential expression , 2019, Nucleic Acids Res..

[22]  Christina Backes,et al.  miRCarta: a central repository for collecting miRNA candidates , 2017, Nucleic Acids Res..

[23]  Isidore Rigoutsos,et al.  MiR-103a-3p targets the 5′ UTR of GPRC5A in pancreatic cells , 2014, RNA.

[24]  Ying Sun,et al.  A four‐miRNA signature identified from genome‐wide serum miRNA profiling predicts survival in patients with nasopharyngeal carcinoma , 2014, International journal of cancer.

[25]  Emily E. Burke,et al.  Comprehensive assessment of multiple biases in small RNA sequencing reveals significant differences in the performance of widely used methods , 2018, BMC Genomics.

[26]  Ryan D. Morin,et al.  Application of massively parallel sequencing to microRNA profiling and discovery in human embryonic stem cells. , 2008, Genome research.

[27]  F. Slack,et al.  Architecture of a validated microRNA::target interaction. , 2004, Chemistry & biology.

[28]  Phillipe Loher,et al.  MINTmap: fast and exhaustive profiling of nuclear and mitochondrial tRNA fragments from short RNA-seq data , 2017, Scientific Reports.

[29]  P. Provost,et al.  Protein interactions and complexes in human microRNA biogenesis and function. , 2008, Frontiers in bioscience : a journal and virtual library.

[30]  E. Hovig,et al.  A Uniform System for the Annotation of Vertebrate microRNA Genes and the Evolution of the Human microRNAome. , 2015, Annual review of genetics.

[31]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[32]  Xiaodong Zhao,et al.  A two-miRNA signature (miR-33a-5p and miR-128-3p) in whole blood as potential biomarker for early diagnosis of lung cancer , 2018, Scientific Reports.

[33]  L. Goracci,et al.  Nutritional and lipidomics biomarkers of docosahexaenoic acid-based multivitamin therapy in pediatric NASH , 2019, Scientific Reports.

[34]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[35]  Aristeidis G. Telonis,et al.  Race Disparities in the Contribution of miRNA Isoforms and tRNA-Derived Fragments to Triple-Negative Breast Cancer. , 2018, Cancer research.

[36]  Steve Pettifer,et al.  EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats , 2013, Bioinform..

[37]  R. Durbin,et al.  The Sequence Ontology: a tool for the unification of genome annotations , 2005, Genome Biology.

[38]  Yue Zhang,et al.  The Loop Position of shRNAs and Pre-miRNAs Is Critical for the Accuracy of Dicer Processing In Vivo , 2012, Cell.

[39]  J. Postlethwait,et al.  miRNA analysis with Prost! reveals evolutionary conservation of organ-enriched expression and post-transcriptional modifications in three-spined stickleback and zebrafish , 2018, Scientific Reports.

[40]  Isidore Rigoutsos,et al.  MINTbase: a framework for the interactive exploration of mitochondrial and nuclear tRNA fragments , 2016, Bioinform..

[41]  V. Kim,et al.  Bias-minimized quantification of microRNA reveals widespread alternative processing and 3′ end modification , 2019, Nucleic acids research.

[42]  Rogan Magee,et al.  Knowledge about the presence or absence of miRNA isoforms (isomiRs) can successfully discriminate amongst 32 TCGA cancer types , 2017, Nucleic acids research.

[43]  M. Menezes,et al.  3′ RNA Uridylation in Epitranscriptomics, Gene Regulation, and Disease , 2018, Front. Mol. Biosci..

[44]  O. Myklebost,et al.  Analysis of the miR-34 family functions in breast cancer reveals annotation error of miR-34b , 2017, Scientific Reports.

[45]  Yvonne Tay,et al.  MicroRNAs to Nanog, Oct4 and Sox2 coding regions modulate embryonic stem cell differentiation , 2008, Nature.

[46]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.

[47]  C. Langford,et al.  5′ isomiR variation is of functional and evolutionary importance , 2014, Nucleic acids research.

[48]  Andrea Acquaviva,et al.  isomiR-SEA: an RNA-Seq analysis tool for miRNAs/isomiRs expression level profiling and miRNA-mRNA interaction sites evaluation , 2016, BMC Bioinformatics.

[49]  E. Wentzel,et al.  A Hexanucleotide Element Directs MicroRNA Nuclear Import , 2007, Science.

[50]  Xavier Estivill,et al.  SeqBuster, a bioinformatic tool for the processing and analysis of small RNAs datasets, reveals ubiquitous miRNA modifications in human embryonic cells , 2009, Nucleic acids research.

[51]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[52]  G. Hannon,et al.  Processing of primary microRNAs by the Microprocessor complex , 2004, Nature.

[53]  A. Hatzigeorgiou,et al.  Redirection of Silencing Targets by Adenosine-to-Inosine Editing of miRNAs , 2007, Science.

[54]  Piotr Zielenkiewicz,et al.  Tools4miRs – one place to gather all the tools for miRNA analysis , 2016, Bioinform..

[55]  Yun-Xing Wang,et al.  Structural Differences between Pri-miRNA Paralogs Promote Alternative Drosha Cleavage and Expand Target Repertoires , 2019, Cell reports.

[56]  Yi Jing,et al.  Analysis of 13 cell types reveals evidence for the expression of numerous novel primate- and tissue-specific microRNAs , 2015, Proceedings of the National Academy of Sciences.

[57]  Ana Kozomara,et al.  miRBase: annotating high confidence microRNAs using deep sequencing data , 2013, Nucleic Acids Res..

[58]  Knut Reinert,et al.  The SeqAn C++ template library for efficient sequence analysis: A resource for programmers. , 2017, Journal of biotechnology.

[59]  Alex Bateman,et al.  RNAcentral: a hub of information for non-coding RNA sequences , 2018, Nucleic Acids Res..

[60]  Zikang Zhang,et al.  Circular RNA: new star, new hope in cancer , 2018, BMC Cancer.

[61]  Shuo Gu,et al.  3' Uridylation Confers miRNAs with Non-canonical Target Repertoires. , 2019, Molecular cell.

[62]  Wei Zhu,et al.  Plasma miRNAs in diagnosis and prognosis of pancreatic cancer: A miRNA expression analysis. , 2018, Gene.

[63]  Doron Betel,et al.  Widespread regulatory activity of vertebrate microRNA* species. , 2011, RNA.

[64]  Alexander S. Baras,et al.  miRge 2.0 for comprehensive analysis of microRNA sequencing data , 2018, BMC Bioinformatics.

[65]  Phillipe Loher,et al.  Profiles of miRNA Isoforms and tRNA Fragments in Prostate Cancer , 2018, Scientific Reports.

[66]  D. Trégouët,et al.  OPTIMIR, a novel algorithm for integrating available genome-wide genotype data into miRNA sequence alignment analysis , 2019, RNA.