MACE: model based analysis of ChIP-exo

Understanding the role of a given transcription factor (TF) in regulating gene expression requires precise mapping of its binding sites in the genome. Chromatin immunoprecipitation-exo, an emerging technique using λ exonuclease to digest TF unbound DNA after ChIP, is designed to reveal transcription factor binding site (TFBS) boundaries with near-single nucleotide resolution. Although ChIP-exo promises deeper insights into transcription regulation, no dedicated bioinformatics tool exists to leverage its advantages. Most ChIP-seq and ChIP-chip analytic methods are not tailored for ChIP-exo, and thus cannot take full advantage of high-resolution ChIP-exo data. Here we describe a novel analysis framework, termed MACE (model-based analysis of ChIP-exo) dedicated to ChIP-exo data analysis. The MACE workflow consists of four steps: (i) sequencing data normalization and bias correction; (ii) signal consolidation and noise reduction; (iii) single-nucleotide resolution border peak detection using the Chebyshev Inequality and (iv) border matching using the Gale-Shapley stable matching algorithm. When applied to published human CTCF, yeast Reb1 and our own mouse ONECUT1/HNF6 ChIP-exo data, MACE is able to define TFBSs with high sensitivity, specificity and spatial resolution, as evidenced by multiple criteria including motif enrichment, sequence conservation, direct sequence pileup, nucleosome positioning and open chromatin states. In addition, we show that the fundamental advance of MACE is the identification of two boundaries of a TFBS with high resolution, whereas other methods only report a single location of the same event. The two boundaries help elucidate the in vivo binding structure of a given TF, e.g. whether the TF may bind as dimers or in a complex with other co-factors.

[1]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[2]  Victor V Lobanenkov,et al.  A genome-wide map of CTCF multivalency redefines the CTCF code. , 2013, Cell reports.

[3]  Marc D. Perry,et al.  ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia , 2012, Genome research.

[4]  Mark Gerstein,et al.  Modeling ChIP Sequencing In Silico with Applications , 2008, PLoS Comput. Biol..

[5]  K. Hansen,et al.  Biases in Illumina transcriptome sequencing caused by random hexamer priming , 2010, Nucleic acids research.

[6]  S. Batzoglou,et al.  Genome-Wide Analysis of Transcription Factor Binding Sites Based on ChIP-Seq Data , 2008, Nature Methods.

[7]  M. Franzblau,et al.  Conflict of Interest Statement , 2004 .

[8]  L. S. Shapley,et al.  College Admissions and the Stability of Marriage , 2013, Am. Math. Mon..

[9]  Terrence S. Furey,et al.  F-Seq: a feature density estimator for high-throughput sequence tags , 2008, Bioinform..

[10]  Bryan J Venters,et al.  Genomic Organization of Human Transcription Initiation Complexes , 2016, PloS one.

[11]  George M Church,et al.  pLogo: a probabilistic approach to visualizing sequence motifs , 2013, Nature Methods.

[12]  Yu-Dong Cai,et al.  Prediction of Nucleosome Positioning Based on Transcription Factor Binding Sites , 2010, PloS one.

[13]  Raymond K. Auerbach,et al.  A User's Guide to the Encyclopedia of DNA Elements (ENCODE) , 2011, PLoS biology.

[14]  B. Pugh,et al.  Genome-wide structure and organization of eukaryotic pre-initiation complexes , 2011, Nature.

[15]  H. Madhani,et al.  Mechanisms that Specify Promoter Nucleosome Location and Identity , 2009, Cell.

[16]  Andrew B Nobel,et al.  ChIPOTle: a user-friendly tool for the analysis of ChIP-chip data , 2005, Genome biology.

[17]  Raymond K. Auerbach,et al.  PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls , 2009, Nature Biotechnology.

[18]  Wing Hung Wong,et al.  TileMap: create chromosomal map of tiling array hybridizations , 2005, Bioinform..

[19]  David J. Arenillas,et al.  JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles , 2009, Nucleic Acids Res..

[20]  Raja Jothi,et al.  Genome-wide identification of in vivo protein–DNA binding sites from ChIP-Seq data , 2008, Nucleic acids research.

[21]  Yuchun Guo,et al.  Discovering homotypic binding events at high spatial resolution , 2010, Bioinform..

[22]  B. Pugh,et al.  Comprehensive Genome-wide Protein-DNA Interactions Detected at Single-Nucleotide Resolution , 2011, Cell.

[23]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[24]  P. Collas The Current State of Chromatin Immunoprecipitation , 2010, Molecular biotechnology.

[25]  Yuchun Guo,et al.  High Resolution Genome Wide Binding Event Finding and Motif Discovery Reveals Transcription Factor Spatial Binding Constraints , 2012, PLoS Comput. Biol..

[26]  Cizhong Jiang,et al.  Interaction of transcriptional regulators with specific nucleosomes across the Saccharomyces genome. , 2009, Molecular cell.

[27]  Michael D. Wilson,et al.  A CpG mutational hotspot in a ONECUT binding site accounts for the prevalent variant of hemophilia B Leyden. , 2013, American journal of human genetics.

[28]  Z. Weng,et al.  The Insulator Binding Protein CTCF Positions 20 Nucleosomes around Its Binding Sites across the Human Genome , 2008, PLoS genetics.

[29]  Keji Zhao,et al.  Regulation of nucleosome landscape and transcription factor targeting at tissue-specific enhancers by BRG1. , 2011, Genome research.

[30]  Steven J. M. Jones,et al.  FindPeaks 3.1: a tool for identifying areas of enrichment from massively parallel short-read sequencing technology , 2008, Bioinform..

[31]  Yijun Ruan,et al.  Mapping of transcription factor binding regions in mammalian cells by ChIP: comparison of array- and sequencing-based technologies. , 2007, Genome research.

[32]  William Stafford Noble,et al.  FIMO: scanning for occurrences of a given motif , 2011, Bioinform..

[33]  K. White,et al.  ChIP-chip versus ChIP-seq: Lessons for experimental design and data analysis , 2011, BMC Genomics.

[34]  Michael D. Wilson,et al.  ChIP-seq: using high-throughput sequencing to discover protein-DNA interactions. , 2009, Methods.

[35]  Juan M. Vaquerizas,et al.  DNA-Binding Specificities of Human Transcription Factors , 2013, Cell.

[36]  R. Chen,et al.  Nucleosome fragility reveals novel functional states of chromatin and poises genes for activation. , 2011, Genome research.

[37]  Shawn Hedman,et al.  Boundary Distributions with Respect to Chebyshev's Inequality , 2010 .