Identification of transcription factor binding sites from ChIP-seq data at high resolution

MOTIVATION Chromatin immunoprecipitation coupled to next-generation sequencing (ChIP-seq) is widely used to study the in vivo binding sites of transcription factors (TFs) and their regulatory targets. Recent improvements to ChIP-seq, such as increased resolution, promise deeper insights into transcriptional regulation, yet require novel computational tools to fully leverage their advantages. RESULTS To this aim, we have developed peakzilla, which can identify closely spaced TF binding sites at high resolution (i.e. resolves individual binding sites even if spaced closely), as we demonstrate using semisynthetic datasets, performing ChIP-seq for the TF Twist in Drosophila embryos with different experimental fragment sizes, and analyzing ChIP-exo datasets. We show that the increased resolution reached by peakzilla is highly relevant, as closely spaced Twist binding sites are strongly enriched in transcriptional enhancers, suggesting a signature to discriminate functional from abundant non-functional or neutral TF binding. Peakzilla is easy to use, as it estimates all the necessary parameters from the data and is freely available. AVAILABILITY AND IMPLEMENTATION The peakzilla program is available from https://github.com/steinmann/peakzilla or http://www.starklab.org/data/peakzilla/. CONTACT stark@starklab.org. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[2]  Keji Zhao,et al.  domains barrier regions reveals demarcation of active and repressive Global analysis of the insulator binding protein CTCF in chromatin Material , 2008 .

[3]  A. Stark,et al.  Deciphering the transcriptional cis-regulatory code. , 2013, Trends in genetics : TIG.

[4]  S. Batzoglou,et al.  Genome-Wide Analysis of Transcription Factor Binding Sites Based on ChIP-Seq Data , 2008, Nature Methods.

[5]  J. Zeitlinger,et al.  A computational pipeline for comparative ChIP-seq analyses , 2011, Nature Protocols.

[6]  A. Visel,et al.  Homotypic clusters of transcription factor binding sites are a key component of human promoters and enhancers. , 2010, Genome research.

[7]  R. Myers,et al.  An Integrated Software System for Analyzing Chip-chip and Chip-seq Data (supplementary Information) , 2008 .

[8]  Yuchun Guo,et al.  Discovering homotypic binding events at high spatial resolution , 2010, Bioinform..

[9]  Michael D. Wilson,et al.  Five-Vertebrate ChIP-seq Reveals the Evolutionary Dynamics of Transcription Factor Binding , 2010, Science.

[10]  Anna G. Nazina,et al.  Homotypic regulatory clusters in Drosophila. , 2003, Genome research.

[11]  Raja Jothi,et al.  Genome-wide identification of in vivo protein–DNA binding sites from ChIP-Seq data , 2008, Nucleic acids research.

[12]  Michael Gribskov,et al.  Combining evidence using p-values: application to sequence homology searches , 1998, Bioinform..

[13]  J. Fak,et al.  Transcriptional Control in the Segmentation Gene Network of Drosophila , 2004, PLoS biology.

[14]  G. Rubin,et al.  Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Barry J Dickson,et al.  HOT regions function as patterned developmental enhancers and have a distinct cis-regulatory signature. , 2012, Genes & development.

[16]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[17]  B. Pugh,et al.  Comprehensive Genome-wide Protein-DNA Interactions Detected at Single-Nucleotide Resolution , 2011, Cell.

[18]  D. W. Knowles,et al.  Transcription Factors Bind Thousands of Active and Inactive Regions in the Drosophila Blastoderm , 2008, PLoS biology.

[19]  Yuchun Guo,et al.  High Resolution Genome Wide Binding Event Finding and Motif Discovery Reveals Transcription Factor Spatial Binding Constraints , 2012, PLoS Comput. Biol..

[20]  P. Bickel,et al.  Systematic evaluation of factors influencing ChIP-seq fidelity , 2012, Nature Methods.

[21]  A. Mortazavi,et al.  Computation for ChIP-seq and RNA-seq studies , 2009, Nature Methods.

[22]  Emmanuel Barillot,et al.  De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis , 2010, Nucleic acids research.

[23]  Wei Zheng,et al.  Genetic Analysis of Variation in Transcription Factor Binding in Yeast , 2010, Nature.

[24]  John J. Wyrick,et al.  Genome-wide location and function of DNA binding proteins. , 2000, Science.

[25]  E. Furlong,et al.  Tissue-specific analysis of chromatin state identifies temporal signatures of enhancer activity during embryonic development , 2012, Nature Genetics.

[26]  Cheng Cheng,et al.  ChIP-PaM: an algorithm to identify protein-DNA interaction using ChIP-Seq data , 2010, Theoretical Biology and Medical Modelling.

[27]  Wyeth W. Wasserman,et al.  JASPAR: an open-access database for eukaryotic transcription factor binding profiles , 2004, Nucleic Acids Res..

[28]  Allen D. Delaney,et al.  Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing , 2007, Nature Methods.

[29]  Gos Micklem,et al.  Supporting Online Material Materials and Methods Figs. S1 to S50 Tables S1 to S18 References Identification of Functional Elements and Regulatory Circuits by Drosophila Modencode , 2022 .

[30]  D. Botstein,et al.  Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF , 2001, Nature.

[31]  Philip Cayting,et al.  An encyclopedia of mouse DNA elements (Mouse ENCODE) , 2012, Genome Biology.

[32]  M. Facciotti,et al.  Evaluation of Algorithm Performance in ChIP-Seq Peak Detection , 2010, PloS one.

[33]  M. Gerstein,et al.  Variation in Transcription Factor Binding Among Humans , 2010, Science.

[34]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[35]  Robert Grossman,et al.  PeakRanger: A cloud-enabled peak caller for ChIP-seq data , 2011, BMC Bioinformatics.

[36]  Rahul Satija,et al.  The TAGteam motif facilitates binding of 21 sequence-specific transcription factors in the Drosophila embryo. , 2012, Genome research.

[37]  M. Gerstein,et al.  Unlocking the secrets of the genome , 2009, Nature.

[38]  P. Park,et al.  Design and analysis of ChIP-seq experiments for DNA-binding proteins , 2008, Nature Biotechnology.

[39]  A. Stark,et al.  Uncovering cis-regulatory sequence requirements for context-specific transcription factor binding , 2012, Genome research.

[40]  J. Zeitlinger,et al.  High conservation of transcription factor binding and evidence for combinatorial regulation across six Drosophila species , 2011, Nature Genetics.

[41]  Raymond K. Auerbach,et al.  Genome-Wide Identification of Binding Sites Defines Distinct Functions for Caenorhabditis elegans PHA-4/FOXA in Development and Environmental Response , 2010, PLoS genetics.

[42]  Lior Pachter,et al.  Binding Site Turnover Produces Pervasive Quantitative Changes in Transcription Factor Binding between Closely Related Drosophila Species , 2010, PLoS biology.

[43]  Lovelace J. Luquette,et al.  Comprehensive analysis of the chromatin landscape in Drosophila , 2010, Nature.