mmannot: How to improve small–RNA annotation?

High-throughput sequencing makes it possible to provide the genome-wide distribution of small non coding RNAs in a single experiment, and contributed greatly to the identification and understanding of these RNAs in the last decade. Small non coding RNAs gather a wide collection of classes, such as microRNAs, tRNA-derived fragments, small nucleolar RNAs and small nuclear RNAs, to name a few. As usual in RNA-seq studies, the sequencing step is followed by a feature quantification step: when a genome is available, the reads are aligned to the genome, their genomic positions are compared to the already available annotations, and the corresponding features are quantified. However, problem arises when many reads map at several positions and while different strategies exist to circumvent this problem, all of them are biased. In this article, we present a new strategy that compares all the reads that map at several positions, and their annotations when available. In many cases, all the hits co-localize with the same feature annotation (a duplicated miRNA or a duplicated gene, for instance). When different annotations exist for a given read, we propose to merge existing features and provide the counts for the merged features. This new strategy has been implemented in a tool, mmannot, freely available at https://github.com/mzytnicki/mmannot.

[1]  D. Naquin,et al.  Systematic comparison of small RNA library preparation protocols for next-generation sequencing , 2018, BMC Genomics.

[2]  Yu Zheng,et al.  piRBase: a comprehensive database of piRNA sequences , 2018, Nucleic Acids Res..

[3]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[4]  Heinz Saedler,et al.  The miRNA156/157 recognition element in the 3' UTR of the Arabidopsis SBP box gene SPL3 prevents early flowering by translational inhibition in seedlings. , 2007, The Plant journal : for cell and molecular biology.

[5]  L. Sieburth,et al.  Widespread Translational Inhibition by Plant miRNAs and siRNAs , 2008, Science.

[6]  Alejandro A. Schäffer,et al.  A Fast and Symmetric DUST Implementation to Mask Low-Complexity DNA Sequences , 2006, J. Comput. Biol..

[7]  Wei Shi,et al.  featureCounts: an efficient general purpose program for assigning sequence reads to genomic features , 2013, Bioinform..

[8]  Gi-Ho Sung,et al.  Evolution of microRNA genes by inverted duplication of target gene sequences in Arabidopsis thaliana , 2004, Nature Genetics.

[9]  Ling Zhu,et al.  The Porcine MicroRNA Transcriptome Response to Transmissible Gastroenteritis Virus Infection , 2015, PloS one.

[10]  Isidre Ferrer,et al.  Specific small-RNA signatures in the amygdala at premotor and motor stages of Parkinson's disease revealed by deep sequencing analysis , 2016, Bioinform..

[11]  F. Slack,et al.  Small non-coding RNAs in animal development , 2008, Nature Reviews Molecular Cell Biology.

[12]  Mick Watson,et al.  Errors in RNA-Seq quantification affect genes of relevance to human disease , 2015, Genome Biology.

[13]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[14]  Sarah C. Ayling,et al.  The Ensembl gene annotation system , 2016, Database J. Biol. Databases Curation.

[15]  Xiyang Zhao,et al.  Complete chloroplast genome sequence of Betula platyphylla: gene organization, RNA editing, and comparative and phylogenetic analyses , 2018, BMC Genomics.

[16]  Ana Kozomara,et al.  miRBase: integrating microRNA annotation and deep-sequencing data , 2010, Nucleic Acids Res..

[17]  Michael B. Stadler,et al.  MicroRNA-Mediated Regulation of Stomatal Development in Arabidopsis[W][OA] , 2007, The Plant Cell Online.

[18]  Gunnar Rätsch,et al.  MMR: a tool for read multi-mapper resolution , 2015, bioRxiv.

[19]  Hai Lin,et al.  Lessons Learned from Whole Exome Sequencing in Multiplex Families Affected by a Complex Genetic Disorder, Intracranial Aneurysm , 2015, PloS one.

[20]  Hwa Jung Lee,et al.  MicroRNA400-guided cleavage of Pentatricopeptide repeat protein mRNAs Renders Arabidopsis thaliana more susceptible to pathogenic bacteria and fungi. , 2014, Plant & cell physiology.

[21]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[22]  Jonathan M. Yeoh,et al.  Improved Placement of Multi-mapping Small RNAs , 2016, G3: Genes, Genomes, Genetics.

[23]  R. Sachidanandam,et al.  Post-transcriptional processing generates a diversity of 5′-modified long and short RNAs , 2009, Nature.

[24]  Adam M. Gustafson,et al.  microRNA-Directed Phasing during Trans-Acting siRNA Biogenesis in Plants , 2005, Cell.

[25]  P. Ingham,et al.  Deep sequencing of small RNA facilitates tissue and sex associated microRNA discovery in zebrafish , 2015, BMC Genomics.

[26]  Tomás C. Moyano,et al.  Integrated RNA-seq and sRNA-seq analysis identifies novel nitrate-responsive genes in Arabidopsis thaliana roots , 2013, BMC Genomics.

[27]  Emily M. Strait,et al.  The arabidopsis information resource: Making and mining the “gold standard” annotated reference plant genome , 2015, Genesis.

[28]  R. Martienssen,et al.  The expanding world of small RNAs in plants , 2015, Nature Reviews Molecular Cell Biology.

[29]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[30]  John D. McPherson,et al.  Optimization of miRNA-seq data preprocessing , 2015, Briefings Bioinform..