ATAC-seq with unique molecular identifiers improves quantification and footprinting

ATAC-seq (Assay for Transposase-Accessible Chromatin with high-throughput sequencing) provides an efficient way to analyze nucleosome-free regions and has been applied widely to identify transcription factor footprints. Both applications rely on the accurate quantification of insertion events of the hyperactive transposase Tn5. However, due to the presence of the PCR amplification, it is impossible to accurately distinguish independently generated identical Tn5 insertion events from PCR duplicates using the standard ATAC-seq technique. Removing PCR duplicates based on mapping coordinates introduces an increasing bias towards highly accessible chromatin regions. To overcome this limitation, we establish a UMI-ATAC-seq technique by incorporating unique molecular identifiers (UMIs) into standard ATAC-seq procedures. In our study, UMI-ATAC-seq can rescue about 20% of reads that are mistaken as PCR duplicates in standard ATAC-seq, which helps identify an additional 50% or more of footprints. We demonstrate that UMI-ATAC-seq could more accurately quantify chromatin accessibility and significantly improve the sensitivity of identifying transcription factor footprints. An analytic pipeline is developed to facilitate the application of UMI-ATAC-seq, and it is available at https://github.com/tzhu-bio/UMI-ATAC-seq.

[1]  Jian Jin,et al.  Development of Genome-Wide DNA Polymorphism Database for Map-Based Cloning of Rice Genes1[w] , 2004, Plant Physiology.

[2]  C. Robin Buell,et al.  The TIGR Plant Repeat Databases: a collective resource for the identification of repetitive sequences in plants , 2004, Nucleic Acids Res..

[3]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[4]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[5]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[6]  William Stafford Noble,et al.  FIMO: scanning for occurrences of a given motif , 2011, Bioinform..

[7]  S. P. Fodor,et al.  Counting individual DNA molecules by the stochastic attachment of diverse labels , 2011, Proceedings of the National Academy of Sciences.

[8]  Cameron S. Osborne,et al.  Large Scale Loss of Data in Low-Diversity Illumina Sequencing Libraries Can Be Recovered by Deferred Cluster Calling , 2011, PloS one.

[9]  D. Schübeler,et al.  Determinants and dynamics of genome accessibility , 2011, Nature Reviews Genetics.

[10]  S. Linnarsson,et al.  Counting absolute numbers of molecules using unique molecular identifiers , 2011, Nature Methods.

[11]  Nathan C. Sheffield,et al.  The accessible chromatin landscape of the human genome , 2012, Nature.

[12]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration , 2012, Briefings Bioinform..

[13]  Howard Y. Chang,et al.  Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position , 2013, Nature Methods.

[14]  Jason Piper,et al.  Wellington: a novel method for the accurate identification of digital genomic footprints from DNase-seq data , 2013, Nucleic acids research.

[15]  Gioele La Manno,et al.  Quantitative single-cell RNA-seq with unique molecular identifiers , 2013, Nature Methods.

[16]  Åsa K. Björklund,et al.  Tn5 transposase and tagmentation procedures for massively scaled sequencing projects , 2014, Genome research.

[17]  M. Sung,et al.  Overlapping Chromatin Remodeling Systems Collaborate Genome-wide at Dynamic Chromatin Transitions , 2013, Nature Structural &Molecular Biology.

[18]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[19]  Howard Y. Chang,et al.  ATAC‐seq: A Method for Assaying Chromatin Accessibility Genome‐Wide , 2015, Current protocols in molecular biology.

[20]  Matthew W. Snyder,et al.  Cell-free DNA Comprises an In Vivo Nucleosome Footprint that Informs Its Tissues-Of-Origin , 2016, Cell.

[21]  Charles Girardot,et al.  Je, a versatile suite to handle multiplexed NGS libraries with unique molecular identifiers , 2016, BMC Bioinformatics.

[22]  Howard Y. Chang,et al.  ATAC-see reveals the accessible genome by transposase-mediated imaging and sequencing , 2016, Nature Methods.

[23]  Perry G. Ridge,et al.  Evaluating the necessity of PCR duplicate removal from next-generation sequencing data and a comparison of approaches , 2016, BMC Bioinformatics.

[24]  Todd M. Allen,et al.  The epigenetic landscape of T cell exhaustion , 2016, Science.

[25]  A. Heger,et al.  UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy , 2016, bioRxiv.

[26]  Ivana Grbesa,et al.  Mapping Genome-wide Accessible Chromatin in Primary Human T Lymphocytes by ATAC-Seq. , 2017, Journal of visualized experiments : JoVE.

[27]  M. Bulyk,et al.  Transcription factor-DNA binding: beyond binding site motifs. , 2017, Current opinion in genetics & development.

[28]  Robert J. Schmitz,et al.  Combining ATAC-seq with nuclei sorting for discovery of cis-regulatory regions in plant genomes , 2016, Nucleic acids research.

[29]  Z. Weng,et al.  Elimination of PCR duplicates in RNA-seq and small RNA-seq using unique molecular identifiers , 2018, BMC Genomics.

[30]  David J. Arenillas,et al.  JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework , 2017, Nucleic acids research.

[31]  Kairong Cui,et al.  Trac-looping measures genome structure and chromatin accessibility , 2018, Nature Methods.

[32]  Nathaniel D. Tippens,et al.  methyl-ATAC-seq measures DNA methylation at accessible chromatin , 2018, bioRxiv.

[33]  R. Kirkegaard,et al.  Enabling high-accuracy long-read amplicon sequences using unique molecular identifiers and Nanopore sequencing , 2019, bioRxiv.

[34]  methyl-ATAC-seq measures DNA methylation at accessible chromatin , 2019, Genome research.

[35]  O. Levy,et al.  The role of chromatin dynamics under global warming response in the symbiotic coral model Aiptasia , 2019, Communications Biology.

[36]  O. Levy,et al.  The role of chromatin dynamics under global warming response in the symbiotic coral model Aiptasia. , 2019 .

[37]  U. Ohler,et al.  Reproducible inference of transcription factor footprints in ATAC-seq and DNase-seq datasets using protocol-specific bias modeling , 2018, Genome Biology.

[38]  Yuanyuan Sun,et al.  Detect accessible chromatin using ATAC-sequencing, from principle to applications , 2019, Hereditas.

[39]  Nicholas Carriero,et al.  Characterizing chromatin landscape from aggregate and single-cell genomic assays using flexible duration modeling , 2020, Nature Communications.