A Comparison of Peak Callers Used for DNase-Seq Data

Genome-wide profiling of open chromatin regions using DNase I and high-throughput sequencing (DNase-seq) is an increasingly popular approach for finding and studying regulatory elements. A variety of algorithms have been developed to identify regions of open chromatin from raw sequence-tag data, which has motivated us to assess and compare their performance. In this study, four published, publicly available peak calling algorithms used for DNase-seq data analysis (F-seq, Hotspot, MACS and ZINBA) are assessed at a range of signal thresholds on two published DNase-seq datasets for three cell types. The results were benchmarked against an independent dataset of regulatory regions derived from ENCODE in vivo transcription factor binding data for each particular cell type. The level of overlap between peak regions reported by each algorithm and this ENCODE-derived reference set was used to assess sensitivity and specificity of the algorithms. Our study suggests that F-seq has a slightly higher sensitivity than the next best algorithms. Hotspot and the ChIP-seq oriented method, MACS, both perform competitively when used with their default parameters. However the generic peak finder ZINBA appears to be less sensitive than the other three. We also assess accuracy of each algorithm over a range of signal thresholds. In particular, we show that the accuracy of F-Seq can be considerably improved by using a threshold setting that is different from the default value.

[1]  Terrence S. Furey,et al.  F-Seq: a feature density estimator for high-throughput sequence tags , 2008, Bioinform..

[2]  J. Stamatoyannopoulos,et al.  Chromatin accessibility pre-determines glucocorticoid receptor binding patterns , 2011, Nature Genetics.

[3]  Finn Drabløs,et al.  A manually curated ChIP-seq benchmark demonstrates room for improvement in current peak-finder programs , 2010, Nucleic acids research.

[4]  P. Park,et al.  Design and analysis of ChIP-seq experiments for DNA-binding proteins , 2008, Nature Biotechnology.

[5]  Marc D. Perry,et al.  ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia , 2012, Genome research.

[6]  Thomas A. Down,et al.  Chromatin Accessibility Data Sets Show Bias Due to Sequence Specificity of the DNase I Enzyme , 2013, PloS one.

[7]  Joseph K. Pickrell,et al.  DNaseI sensitivity QTLs are a major determinant of human expression variation , 2011, Nature.

[8]  Tiejun Tong,et al.  A short survey of computational analysis methods in analysing ChIP-seq data , 2010, Human Genomics.

[9]  B. L,et al.  The accessible chromatin landscape of the human genome , 2016 .

[10]  Alexey V. Gorshkov,et al.  Quantum nonlinear optics with single photons enabled by strongly interacting atoms , 2012, Nature.

[11]  Y. Tong,et al.  Genome-wide analysis for protein-DNA interaction: ChIP-chip. , 2009, Methods in molecular biology.

[12]  A. Mortazavi,et al.  Technical considerations for functional sequencing assays , 2012, Nature Immunology.

[13]  Christoph D. Schmid,et al.  Rapid innovation in ChIP-seq peak-calling algorithms is outdistancing benchmarking efforts , 2011, Briefings Bioinform..

[14]  Nathan C. Sheffield,et al.  Open chromatin defined by DNaseI and FAIRE identifies regulatory elements that shape cell-type identity. , 2011, Genome research.

[15]  G. Crawford,et al.  DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. , 2010, Cold Spring Harbor protocols.

[16]  Tae Hoon Kim,et al.  Genome-wide analysis of protein-DNA interactions. , 2006, Annual review of genomics and human genetics.

[17]  A. Mortazavi,et al.  Computation for ChIP-seq and RNA-seq studies , 2009, Nature Methods.

[18]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[19]  J. Ibrahim,et al.  ZINBA integrates local covariates with DNA-seq data to identify broad and narrow regions of enrichment, even within amplified genomic regions , 2011, Genome Biology.

[20]  Nathan C. Sheffield,et al.  The accessible chromatin landscape of the human genome , 2012, Nature.

[21]  M. Sung,et al.  Quantitative analysis of genome-wide chromatin remodeling. , 2012, Methods in molecular biology.

[22]  P. Zhou,et al.  Correlation Between DNase I Hypersensitive Site Distribution and Gene Expression in HeLa S3 Cells , 2012, PloS one.

[23]  Karen L. Mohlke,et al.  A map of open chromatin in human pancreatic islets , 2010, Nature Genetics.

[24]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[25]  Richard S. Sandstrom,et al.  BEDOPS: high-performance genomic feature operations , 2012, Bioinform..

[26]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[27]  D. Rubinsztein Annual Review of Genomics and Human Genetics , 2001 .

[28]  Raymond K. Auerbach,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[29]  Gavin Giovannoni,et al.  A ChIP-seq defined genome-wide map of vitamin D receptor binding: associations with disease and evolution. , 2010, Genome research.

[30]  William Stafford Noble,et al.  Genome-scale mapping of DNase I sensitivity in vivo using tiling DNA microarrays , 2006, Nature Methods.

[31]  M. Daly,et al.  Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). , 2005, Genome research.

[32]  Pedro Madrigal,et al.  Current bioinformatic approaches to identify DNase I hypersensitive sites and genomic footprints from DNase-seq data , 2012, Front. Gene..

[33]  Washington Seattle An integrated encyclopedia of DNA elements in the human genome , 2016 .

[34]  Tim J. P. Hubbard,et al.  Dalliance: interactive genome viewing on the web , 2011, Bioinform..