Quantification of experimentally induced nucleotide conversions in high-throughput sequencing datasets

BackgroundMethods to read out naturally occurring or experimentally introduced nucleic acid modifications are emerging as powerful tools to study dynamic cellular processes. The recovery, quantification and interpretation of such events in high-throughput sequencing datasets demands specialized bioinformatics approaches.ResultsHere, we present Digital Unmasking of Nucleotide conversions in K-mers (DUNK), a data analysis pipeline enabling the quantification of nucleotide conversions in high-throughput sequencing datasets. We demonstrate using experimentally generated and simulated datasets that DUNK allows constant mapping rates irrespective of nucleotide-conversion rates, promotes the recovery of multimapping reads and employs Single Nucleotide Polymorphism (SNP) masking to uncouple true SNPs from nucleotide conversions to facilitate a robust and sensitive quantification of nucleotide-conversions. As a first application, we implement this strategy as SLAM-DUNK for the analysis of SLAMseq profiles, in which 4-thiouridine-labeled transcripts are detected based on T > C conversions. SLAM-DUNK provides both raw counts of nucleotide-conversion containing reads as well as a base-content and read coverage normalized approach for estimating the fractions of labeled transcripts as readout.ConclusionBeyond providing a readily accessible tool for analyzing SLAMseq and related time-resolved RNA sequencing methods (TimeLapse-seq, TUC-seq), DUNK establishes a broadly applicable strategy for quantifying nucleotide conversions.

[1]  C. Gissi,et al.  Untranslated regions of mRNAs , 2002, Genome Biology.

[2]  Scott B. Dewell,et al.  Transcriptome-wide Identification of RNA-Binding Protein and MicroRNA Target Sites by PAR-CLIP , 2010, Cell.

[3]  Jesse J. Lipp,et al.  SLAM-seq defines direct gene-regulatory functions of the BRD4-MYC axis , 2018, Science.

[4]  SLAM-ITseq: Sequencing cell type-specific transcriptomes without cell sorting , 2017, bioRxiv.

[5]  L. E. McDonald,et al.  A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Chengqi Yi,et al.  Epitranscriptome sequencing technologies: decoding RNA modifications , 2016, Nature Methods.

[7]  Florian Erhard,et al.  Dissecting newly transcribed and old RNA using GRAND-SLAM , 2018, Bioinform..

[8]  Fengtang Yang,et al.  A reversible haploid mouse embryonic stem cell biobank resource for functional genomics , 2017, Nature.

[9]  P. Moll,et al.  QuantSeq 3[prime] mRNA sequencing for RNA quantification , 2014 .

[10]  A. Quinlan BEDTools: The Swiss‐Army Tool for Genome Feature Analysis , 2014, Current protocols in bioinformatics.

[11]  Arndt von Haeseler,et al.  NextGenMap: fast and accurate read mapping in highly polymorphic genomes , 2013, Bioinform..

[12]  Christopher A. Miller,et al.  VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. , 2012, Genome research.

[13]  Johannes Zuber,et al.  Thiol-linked alkylation of RNA to assess expression dynamics , 2017, Nature Methods.

[14]  Richard A Young,et al.  Control of the Embryonic Stem Cell State , 2011, Cell.

[15]  David G. Knowles,et al.  Fast Computation and Applications of Genome Mappability , 2012, PloS one.

[16]  Måns Magnusson,et al.  MultiQC: summarize analysis results for multiple tools and samples in a single report , 2016, Bioinform..