Manatee: detection and quantification of small non-coding RNAs from next-generation sequencing data

Small non-coding RNAs (sncRNAs) play important roles in health and disease. Next Generation Sequencing technologies are considered as the most powerful and versatile methodologies to explore small RNA (sRNA) transcriptomes in diverse experimental and clinical studies. Small RNA-Seq data analysis proved to be challenging due to non-unique genomic origin, short length and abundant post-transcriptional modifications of sRNA species. Here we present Manatee, an algorithm for quantification of sRNA classes and detection of uncharacterized expressed non-coding loci. Manatee adopts a novel approach for abundance estimation of genomic reads that combines sRNA annotation with reliable alignment density information and extensive reads salvation. Comparison of Manatee against state-of-the-art implementations using real/simulated data sets demonstrates its superior accuracy in quantification of diverse sRNA classes providing at the same time insights about unannotated expressed loci. It is user-friendly, easily embeddable in pipelines and provides a simplified output suitable for direct usage in downstream analyses and functional studies.

[1]  George A Calin,et al.  Key principles of miRNA involvement in human diseases , 2014, Discoveries.

[2]  Vladimir A. Richter,et al.  Regulatory Role of Small Nucleolar RNAs in Human Diseases , 2015, BioMed research international.

[3]  E. Martens-Uzunova,et al.  Beyond microRNA--novel RNAs derived from small non-coding RNA and their implication in cancer. , 2013, Cancer letters.

[4]  Artemis G. Hatzigeorgiou,et al.  DIANA-mirExTra v2.0: Uncovering microRNAs and transcription factors with crucial roles in NGS expression data , 2016, Nucleic Acids Res..

[5]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[6]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[7]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[8]  Gyorgy Hutvagner,et al.  tRNA-Derived Fragments (tRFs): Emerging New Roles for an Ancient RNA in the Regulation of Gene Expression , 2015, Life.

[9]  B. Berkhout,et al.  A miRNA-tRNA mix-up: tRNA origin of proposed miRNA. , 2010, RNA biology.

[10]  C. Gatto,et al.  Comprehensive analysis of microRNA genomic loci identifies pervasive repetitive-element origins , 2011, Mobile genetic elements.

[11]  Sebastian D. Mackowiak,et al.  miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades , 2011, Nucleic acids research.

[12]  Daniel J. Gaffney,et al.  A survey of best practices for RNA-seq data analysis , 2016, Genome Biology.

[13]  Ana Kozomara,et al.  miRBase: annotating high confidence microRNAs using deep sequencing data , 2013, Nucleic Acids Res..

[14]  Peter F. Stadler,et al.  Identification and Classification of Small RNAs in Transcriptome Sequence Data , 2010, Pacific Symposium on Biocomputing.

[15]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[16]  D. Ruggero,et al.  Small RNAs with big implications: new insights into H/ACA snoRNA function and their role in human disease , 2015, Wiley interdisciplinary reviews. RNA.

[17]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[18]  Jonathan M. Yeoh,et al.  Improved Placement of Multi-mapping Small RNAs , 2016, G3: Genes, Genomes, Genetics.

[19]  Ángel M. Alganza,et al.  sRNAbench: profiling of small RNAs and its sequence variants in single or multi-species high-throughput experiments , 2014 .

[20]  Ioannis S Vlachos,et al.  Online resources for miRNA analysis. , 2013, Clinical biochemistry.

[21]  A. Malhotra,et al.  A novel class of small RNAs: tRNA-derived RNA fragments (tRFs). , 2009, Genes & development.

[22]  Anton J. Enright,et al.  Detecting and Comparing Non-Coding RNAs in the High-Throughput Era , 2013, International journal of molecular sciences.

[23]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[24]  Patricia P. Chan,et al.  GtRNAdb 2.0: an expanded database of transfer RNA genes identified in complete and draft genomes , 2015, Nucleic Acids Res..

[25]  Rickard Sandberg,et al.  Single-cell sequencing of the small-RNA transcriptome , 2016, Nature Biotechnology.

[26]  D. Bartel,et al.  The impact of microRNAs on protein output , 2008, Nature.

[27]  Ana M. Aransay,et al.  miRanalyzer: a microRNA detection and analysis tool for next-generation sequencing experiments , 2009, Nucleic Acids Res..

[28]  Akhilesh Pandey,et al.  miRge - A Multiplexed Method of Processing Small RNA-Seq Data to Determine MicroRNA Entropy , 2015, PloS one.

[29]  Paul Theodor Pyl,et al.  HTSeq—a Python framework to work with high-throughput sequencing data , 2014, bioRxiv.

[30]  J. Kawai,et al.  Cross-mapping and the identification of editing sites in mature microRNAs in high-throughput sequencing libraries. , 2010, Genome research.