Characterizing protein-DNA binding event subtypes in ChIP-exo data

Motivation Regulatory proteins associate with the genome either by directly binding cognate DNA motifs or via protein‐protein interactions with other regulators. Each recruitment mechanism may be associated with distinct motifs and may also result in distinct characteristic patterns in high‐resolution protein‐DNA binding assays. For example, the ChIP‐exo protocol precisely characterizes protein‐DNA crosslinking patterns by combining chromatin immunoprecipitation (ChIP) with 5′ → 3′ exonuclease digestion. Since different regulatory complexes will result in different protein‐DNA crosslinking signatures, analysis of ChIP‐exo tag enrichment patterns should enable detection of multiple protein‐DNA binding modes for a given regulatory protein. However, current ChIP‐exo analysis methods either treat all binding events as being of a uniform type or rely on motifs to cluster binding events into subtypes. Results To systematically detect multiple protein‐DNA interaction modes in a single ChIP‐exo experiment, we introduce the ChIP‐exo mixture model (ChExMix). ChExMix probabilistically models the genomic locations and subtype memberships of binding events using both ChIP‐exo tag distribution patterns and DNA motifs. We demonstrate that ChExMix achieves accurate detection and classification of binding event subtypes using in silico mixed ChIP‐exo data. We further demonstrate the unique analysis abilities of ChExMix using a collection of ChIP‐exo experiments that profile the binding of key transcription factors in MCF‐7 cells. In these data, ChExMix identifies possible recruitment mechanisms of FoxA1 and ER&agr;, thus demonstrating that ChExMix can effectively stratify ChIP‐exo binding events into biologically meaningful subtypes. Availability and implementation ChExMix is available from https://github.com/seqcode/chexmix. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  K. Kaestner,et al.  The Pioneer Transcription Factor FoxA Maintains an Accessible Nucleosome Configuration at Enhancers for Tissue-Specific Gene Activation. , 2016, Molecular cell.

[2]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[3]  Dustin E. Schones,et al.  High-Resolution Profiling of Histone Methylations in the Human Genome , 2007, Cell.

[4]  Céline Hernandez,et al.  ChIP-exo signal associated with DNA-binding motifs provides insight into the genomic binding of the glucocorticoid receptor and cooperating transcription factors , 2015, Genome research.

[5]  William Stafford Noble,et al.  Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors , 2012, Genome research.

[6]  Marzia A. Cremona,et al.  Peak shape clustering reveals biological insights , 2015, BMC Bioinformatics.

[7]  Martin C. Frith,et al.  Inferring transcription factor complexes from ChIP-seq data , 2011, Nucleic acids research.

[8]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[9]  Kate B. Cook,et al.  Determination and Inference of Eukaryotic Transcription Factor Sequence Specificity , 2014, Cell.

[10]  Yuchun Guo,et al.  Discovering homotypic binding events at high spatial resolution , 2010, Bioinform..

[11]  P. Robinson,et al.  Q-nexus: a comprehensive and efficient analysis pipeline designed for ChIP-nexus , 2016, BMC Genomics.

[12]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[13]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[14]  Victor X. Jin,et al.  Genome-wide analysis reveals positional-nucleosome-oriented binding pattern of pioneer factor FOXA1 , 2016, Nucleic acids research.

[15]  David K. Gifford,et al.  An Integrated Model of Multiple-Condition ChIP-Seq Data Reveals Predeterminants of Cdx2 Binding , 2014, PLoS Comput. Biol..

[16]  Jens Keilwagen,et al.  Varying levels of complexity in transcription factor binding motifs , 2015, Nucleic acids research.

[17]  Michael T. Zimmermann,et al.  MACE: model based analysis of ChIP-exo , 2014, Nucleic acids research.

[18]  Martha L. Bulyk,et al.  Distinguishing Direct versus Indirect Transcription Factor-DNA Interactions , 2010, RECOMB.

[19]  T. Bailey,et al.  Inferring direct DNA binding from ChIP-seq , 2012, Nucleic acids research.

[20]  Shane J. Neph,et al.  An expansive human regulatory lexicon encoded in transcription factor footprints , 2012, Nature.

[21]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Yuchun Guo,et al.  High Resolution Genome Wide Binding Event Finding and Motif Discovery Reveals Transcription Factor Spatial Binding Constraints , 2012, PLoS Comput. Biol..

[23]  Pedro Madrigal CexoR: an R/Bioconductor package to uncover high-resolution protein-DNA interactions in ChIP-exo replicates , 2015 .

[24]  H. Ng,et al.  Uniform, optimal signal processing of mapped deep-sequencing data , 2013, Nature Biotechnology.

[25]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[26]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[27]  Jason S Carroll,et al.  Development of an Illumina-based ChIP-exonuclease method provides insight into FoxA1-DNA binding properties , 2013, Genome Biology.

[28]  Anaïs F. Bardet,et al.  Identification of transcription factor binding sites from ChIP-seq data at high resolution , 2013, Bioinform..

[29]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[30]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[31]  I. Mills,et al.  CTCF modulates Estrogen Receptor function through specific chromatin and nuclear matrix interactions , 2016, Nucleic acids research.

[32]  Sündüz Keles,et al.  Normalization of ChIP-seq data with control , 2012, BMC Bioinformatics.

[33]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[34]  Teemu Kivioja,et al.  PeakXus: comprehensive transcription factor binding site discovery from ChIP-Nexus and ChIP-Exo experiments , 2016, Bioinform..

[35]  G. K. Sandve,et al.  A map of direct TF–DNA interactions in the human genome , 2018, bioRxiv.

[36]  Julia Zeitlinger,et al.  ChIP-nexus: a novel ChIP-exo protocol for improved detection of in vivo transcription factor binding footprints , 2014, Nature Biotechnology.

[37]  B. Pugh,et al.  Comprehensive Genome-wide Protein-DNA Interactions Detected at Single-Nucleotide Resolution , 2011, Cell.