Improved Placement of Multi-Mapping Small RNAs

High-throughput sequencing of small RNAs (sRNA-seq) is a popular method used to discover and annotate microRNAs (miRNAs), endogenous short interfering RNAs (siRNAs) and Piwi-associated RNAs (piRNAs). One of the key steps in sRNA-seq data analysis is alignment to a reference genome. sRNA-seq libraries often have a high proportion of reads which align to multiple genomic locations, which makes determining their true origins difficult. Commonly used sRNA-seq alignment methods result in either very low precision (choosing an alignment at random) or sensitivity (ignoring multi-mapping reads). Here, we describe and test an sRNA-seq alignment strategy that uses local genomic context to guide decisions on proper placements of multi-mapped sRNA-seq reads. Tests using simulated sRNA-seq data demonstrated that this local-weighting method outperforms other alignment strategies using three different plant genomes. Experimental analyses with real sRNA-seq data also indicate superior performance of local-weighting methods for both plant miRNAs and heterochromatic siRNAs. The local-weighting methods we have developed are implemented as part of the sRNA-seq analysis program ShortStack, which is freely available under a general public license. Improved genome alignments of sRNA-seq data should increase the quality of downstream analyses and genome annotation efforts.

[1]  Vincent Moulton,et al.  The UEA sRNA workbench: a suite of tools for analysing and visualizing next generation sequencing microRNA and small RNA datasets , 2012, Bioinform..

[2]  Haixu Tang,et al.  Identification of Pol IV and RDR2-dependent precursors of 24 nt siRNAs guiding de novo DNA methylation in Arabidopsis , 2015, eLife.

[3]  Adam M. Gustafson,et al.  microRNA-Directed Phasing during Trans-Acting siRNA Biogenesis in Plants , 2005, Cell.

[4]  Vincent Moulton,et al.  A toolkit for analysing large-scale plant small RNA datasets , 2008, Bioinform..

[5]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[6]  C. Sander,et al.  A novel class of small RNAs bind to MILI protein in mouse testes , 2006, Nature.

[7]  M. Axtell ShortStack: comprehensive annotation and quantification of small RNA genes. , 2013, RNA.

[8]  Matthias Platzer,et al.  Bmc Molecular Biology Chicken Ovalbumin Upstream Promoter Transcription Factor Ii Regulates Uncoupling Protein 3 Gene Transcription in Phodopus Sungorus , 2007 .

[9]  I. Henderson,et al.  A One Precursor One siRNA Model for Pol IV-Dependent siRNA Biogenesis , 2015, Cell.

[10]  Philip C J Donoghue,et al.  Evolutionary history of plant microRNAs. , 2014, Trends in plant science.

[11]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[12]  Julius Brennecke,et al.  Specialized piRNA Pathways Act in Germline and Somatic Tissues of the Drosophila Ovary , 2009, Cell.

[13]  Marc W. Schmid,et al.  Rcount: simple and flexible RNA-Seq read counting , 2015, Bioinform..

[14]  B. Gregory,et al.  Detection of Pol IV/RDR2-dependent transcripts at the genomic scale in Arabidopsis reveals features and regulation of siRNA biogenesis , 2015, Genome research.

[15]  M. Axtell,et al.  Seeing the forest for the trees: annotating small RNA producing genes in plants. , 2014, Current opinion in plant biology.

[16]  D. Bartel,et al.  The Drosophila hairpin RNA pathway generates endogenous short interfering RNAs , 2008, Nature.

[17]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[18]  Josh T. Cuperus,et al.  Evolution and Functional Diversification of MIRNA Genes , 2011, Plant Cell.

[19]  Jian Ye,et al.  Primer-BLAST: A tool to design target-specific primers for polymerase chain reaction , 2012, BMC Bioinformatics.

[20]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[21]  Huan-Wei Huang,et al.  Dicer-independent RNA-directed DNA methylation in Arabidopsis , 2015, Cell Research.

[22]  Š. Čikoš,et al.  Relative quantification of mRNA: comparison of methods currently used for real-time PCR data analysis , 2007, BMC Molecular Biology.

[23]  Ana Kozomara,et al.  miRBase: annotating high confidence microRNAs using deep sequencing data , 2013, Nucleic Acids Res..

[24]  Y. Qi,et al.  A Dicer-Independent Route for Biogenesis of siRNAs that Direct DNA Methylation in Arabidopsis. , 2016, Molecular cell.

[25]  A. Moorman,et al.  Assumption-free analysis of quantitative real-time polymerase chain reaction (PCR) data , 2003, Neuroscience Letters.

[26]  Geoffrey J Faulkner,et al.  A rescue strategy for multimapping short sequence tags refines surveys of transcriptional activity by CAGE. , 2008, Genomics.

[27]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.