Reproducible inference of transcription factor footprints in ATAC-seq and DNase-seq datasets via protocol-specific bias modeling

DNase-seq and ATAC-seq are broadly used methods to assay open chromatin regions genome-wide. The single nucleotide resolution of DNase-seq has been further exploited to infer transcription factor binding sites (TFBS) in regulatory regions via footprinting. Recent studies have demonstrated the sequence bias of DNase I and its adverse effects on footprinting efficiency. However, footprinting and the impact of sequence bias have not been extensively studied for ATAC-seq. Here, we undertake a systematic comparison of the two methods and show that a modification to the ATAC-seq protocol increases its yield and its agreement with DNase-seq data from the same cell line. We demonstrate that the two methods have distinct sequence biases and correct for these protocol-specific biases when performing footprinting. Despite differences in footprint shapes, the locations of the inferred footprints in ATAC-seq and DNase-seq are largely concordant. However, the protocol-specific sequence biases in conjunction with the sequence content of TFBSs impacts the discrimination of footprint from background, which leads to one method outperforming the other for some TFs. Finally, we address the depth required for reproducible identification of open chromatin regions and TF footprints.

[1]  Terrence S. Furey,et al.  DeFCoM: analysis and modeling of transcription factor binding sites using a motif‐centric genomic footprinter , 2016, Bioinform..

[2]  Robert J. Schmitz,et al.  Combining ATAC-seq with nuclei sorting for discovery of cis-regulatory regions in plant genomes , 2016, Nucleic acids research.

[3]  Y. Gilad,et al.  Reducing mitochondrial reads in ATAC-seq using CRISPR/Cas9 , 2016, bioRxiv.

[4]  Nicholas Carriero,et al.  Mocap: large-scale inference of transcription factor binding sites from chromatin accessibility , 2016, bioRxiv.

[5]  M. Sung,et al.  Genome-wide footprinting: ready for prime time? , 2016, Nature Methods.

[6]  Jeff Vierstra,et al.  Genomic footprinting , 2016, Nature Methods.

[7]  E. Gusmão,et al.  Analysis of computational footprinting methods for DNase sequencing experiments , 2016, Nature Methods.

[8]  J. Michael Cherry,et al.  ENCODE data at the ENCODE portal , 2015, Nucleic Acids Res..

[9]  Matthew Stephens,et al.  msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding , 2015, PloS one.

[10]  Harri Lähdesmäki,et al.  BinDNase: a discriminatory approach for transcription factor binding prediction using DNase I hypersensitivity data , 2015, Bioinform..

[11]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[12]  Ivan G. Costa,et al.  Detection of active transcription factor binding sites with the combination of DNase hypersensitivity and histone modifications , 2014, Bioinform..

[13]  Myong-Hee Sung,et al.  DNase footprint signatures are dictated by factor dynamics and DNA sequence. , 2014, Molecular cell.

[14]  Uwe Ohler,et al.  Explicit DNase sequence bias modeling enables high-resolution transcription factor footprint detection , 2014, Nucleic acids research.

[15]  Tatsunori B. Hashimoto,et al.  Discovery of non-directional and directional pioneer transcription factors by modeling DNase profile magnitude and shape , 2014, Nature Biotechnology.

[16]  Howard Y. Chang,et al.  Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position , 2013, Nature Methods.

[17]  Jason Piper,et al.  Wellington: a novel method for the accurate identification of digital genomic footprints from DNase-seq data , 2013, Nucleic acids research.

[18]  Thomas A. Down,et al.  Chromatin Accessibility Data Sets Show Bias Due to Sequence Specificity of the DNase I Enzyme , 2013, PloS one.

[19]  R. Sandstrom,et al.  Probing DNA shape and methylation state on a genomic scale with DNase I , 2013, Proceedings of the National Academy of Sciences.

[20]  Timothy Daley,et al.  Predicting the molecular complexity of sequencing libraries , 2013, Nature Methods.

[21]  Alexander J. Hartemink,et al.  Using DNase Digestion Data to Accurately Identify Transcription Factor Binding Sites , 2012, Pacific Symposium on Biocomputing.

[22]  Shane J. Neph,et al.  An expansive human regulatory lexicon encoded in transcription factor footprints , 2012, Nature.

[23]  E. Furlong,et al.  Transcription factors: from enhancer binding to developmental control , 2012, Nature Reviews Genetics.

[24]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[25]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[26]  Peter J. Bickel,et al.  Measuring reproducibility of high-throughput experiments , 2011, 1110.4705.

[27]  E. Birney,et al.  High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. , 2011, Genome research.

[28]  Andrew C. Adey,et al.  Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition , 2010, Genome Biology.

[29]  G. Crawford,et al.  DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. , 2010, Cold Spring Harbor protocols.

[30]  Fernando C Pereira,et al.  A transcription factor affinity-based code for mammalian transcription initiation. , 2009, Genome research.

[31]  William Stafford Noble,et al.  Global mapping of protein-DNA interactions in vivo by digital genomic footprinting , 2009, Nature Methods.

[32]  Z. Weng,et al.  High-Resolution Mapping and Characterization of Open Chromatin across the Genome , 2008, Cell.

[33]  D. Galas,et al.  DNAse footprinting: a simple method for the detection of protein-DNA binding specificity. , 1978, Nucleic acids research.

[34]  Uwe Ohler,et al.  JAMM: a peak finder for joint analysis of NGS replicates , 2015, Bioinform..

[35]  William Stafford Noble,et al.  Epigenetic priors for identifying active transcription factor binding sites , 2012, Bioinform..

[36]  D. S. Gross,et al.  Nuclease hypersensitive sites in chromatin. , 1988, Annual review of biochemistry.

[37]  Jacob F. Degner,et al.  Sequence and Chromatin Accessibility Data Accurate Inference of Transcription Factor Binding from Dna Material Supplemental Open Access , 2022 .