Improving ATAC-seq Data Analysis with AIAP, a Quality Control and Integrative Analysis Package

ATAC-seq is a technique widely used to investigate genome-wide chromatin accessibility. The recently published Omni-ATAC-seq protocol substantially improves the signal/noise ratio and reduces the input cell number. High-quality data are critical to ensure accurate analysis. Several tools have been developed for assessing sequencing quality and insertion size distribution for ATAC-seq data; however, key quality control (QC) metrics have not yet been established to accurately determine the quality of ATAC-seq data. Here, we optimized the analysis strategy for ATAC-seq and defined a series of QC metrics, including reads under peak ratio (RUPr), background (BG), promoter enrichment (ProEn), subsampling enrichment (SubEn), and other measurements. We incorporated these QC tests into our recently developed ATAC-seq Integrative Analysis Package (AIAP) to provide a complete ATAC-seq analysis system, including quality assurance, improved peak calling, and downstream differential analysis. We demonstrated a significant improvement of sensitivity (20%~60%) in both peak calling and differential analysis by processing paired-end ATAC-seq datasets using AIAP. AIAP is compiled into Docker/Singularity, and with one command line execution, it generates a comprehensive QC report. We used ENCODE ATAC-seq data to benchmark and generate QC recommendations, and developed qATACViewer for the user-friendly interaction with the QC report.

[1]  Howard Y. Chang,et al.  Single-cell chromatin accessibility reveals principles of regulatory variation , 2015, Nature.

[2]  J. Schug,et al.  Integration of ATAC-seq and RNA-seq identifies human alpha cell and beta cell signature genes , 2016, Molecular metabolism.

[3]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[4]  L. Pennacchio,et al.  Genetic dissection of the α-globin super-enhancer in vivo , 2016, Nature Genetics.

[5]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[6]  Manolis Kellis,et al.  Dynamics of the epigenetic landscape during erythroid differentiation after GATA1 restoration. , 2011, Genome research.

[7]  Jun Yu,et al.  ATACseqQC: a Bioconductor package for post-alignment quality assessment of ATAC-seq data , 2018, BMC Genomics.

[8]  B. Zhang,et al.  Combining MeDIP-seq and MRE-seq to investigate genome-wide CpG methylation. , 2015, Methods.

[9]  G. Crawford,et al.  DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. , 2010, Cold Spring Harbor protocols.

[10]  J. O’Shea,et al.  Developmental Acquisition of Regulomes Underlies Innate Lymphoid Cell Functionality , 2016, Cell.

[11]  Jean-Christophe Aude,et al.  Genome-wide nucleosome specificity and function of chromatin remodellers in ES cells , 2015, Nature.

[12]  P. Giresi,et al.  Using formaldehyde-assisted isolation of regulatory elements (FAIRE) to isolate active regulatory DNA , 2012, Nature Protocols.

[13]  Jeffrey B. Cheng,et al.  Regulatory Network Decoded from Epigenomes of Surface Ectoderm-Derived Cell Types , 2014, Nature Communications.

[14]  Howard Y. Chang,et al.  A Long Noncoding RNA lincRNA-EPS Acts as a Transcriptional Brake to Restrain Inflammation , 2016, Cell.

[15]  J. Chiorini,et al.  ATAC2GRN: optimized ATAC-seq and DNase1-seq pipelines for rapid and accurate genome regulatory network inference , 2018, BMC Genomics.

[16]  Steven J. M. Jones,et al.  The International Human Epigenome Consortium: A Blueprint for Scientific Collaboration and Discovery , 2016, Cell.

[17]  Ivan Smirnov,et al.  Epigenomic annotation of genetic variants using the Roadmap Epigenome Browser , 2015, Nature Biotechnology.

[18]  Howard Y. Chang,et al.  Leukemia-Associated Cohesin Mutants Dominantly Enforce Stem Cell Programs and Impair Human Hematopoietic Progenitor Differentiation. , 2015, Cell stem cell.

[19]  Nicholas A. Sinnott-Armstrong,et al.  An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues , 2017, Nature Methods.

[20]  W. Wong,et al.  Modeling gene regulation from paired expression and chromatin accessibility data , 2017, Proceedings of the National Academy of Sciences.

[21]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[22]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[23]  David L. Aylor,et al.  The NIEHS TaRGET II Consortium and environmental epigenomics , 2018, Nature Biotechnology.

[24]  Stephen L. Johnson,et al.  Developmental enhancers revealed by extensive DNA methylome maps of zebrafish early embryos , 2015, Nature Communications.

[25]  Dustin E. Schones,et al.  Dynamic Regulation of Nucleosome Positioning in the Human Genome , 2008, Cell.

[26]  Shane J. Neph,et al.  A comparative encyclopedia of DNA elements in the mouse genome , 2014, Nature.

[27]  Richard A. Moore,et al.  Functional DNA methylation differences between tissues, cell types, and across individuals discovered using the M&M algorithm , 2013, Genome research.

[28]  Zheng Wei,et al.  esATAC: an easy-to-use systematic pipeline for ATAC-seq data analysis , 2018, Bioinform..

[29]  Nathan C. Sheffield,et al.  Predicting cell-type–specific gene expression from regions of open chromatin , 2012, Genome research.

[30]  Howard Y. Chang,et al.  Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position , 2013, Nature Methods.

[31]  Michael J. Ziller,et al.  Transcription factor binding dynamics during human ESC differentiation , 2015, Nature.

[32]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[33]  Jason Piper,et al.  Wellington: a novel method for the accurate identification of digital genomic footprints from DNase-seq data , 2013, Nucleic acids research.

[34]  Marc D. Perry,et al.  ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia , 2012, Genome research.

[35]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[36]  David Haussler,et al.  The Human Epigenome Browser at Washington University , 2011, Nature Methods.

[37]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.