Extracting transcription factor targets from ChIP-Seq data

ChIP-Seq technology, which combines chromatin immunoprecipitation (ChIP) with massively parallel sequencing, is rapidly replacing ChIP-on-chip for the genome-wide identification of transcription factor binding events. Identifying bound regions from the large number of sequence tags produced by ChIP-Seq is a challenging task. Here, we present GLITR (GLobal Identifier of Target Regions), which accurately identifies enriched regions in target data by calculating a fold-change based on random samples of control (input chromatin) data. GLITR uses a classification method to identify regions in ChIP data that have a peak height and fold-change which do not resemble regions in an input sample. We compare GLITR to several recent methods and show that GLITR has improved sensitivity for identifying bound regions closely matching the consensus sequence of a given transcription factor, and can detect bona fide transcription factor targets missed by other programs. We also use GLITR to address the issue of sequencing depth, and show that sequencing biological replicates identifies far more binding regions than re-sequencing the same sample.

[1]  Douglas L. Brutlag,et al.  BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes , 2000, Pacific Symposium on Biocomputing.

[2]  Martha L Bulyk,et al.  DNA microarray technologies for measuring protein-DNA interactions. , 2006, Current opinion in biotechnology.

[3]  Allen D. Delaney,et al.  Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing , 2007, Nature Methods.

[4]  E. Mardis ChIP-seq: welcome to the new frontier , 2007, Nature Methods.

[5]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[6]  N. D. Clarke,et al.  Integration of External Signaling Pathways with the Core Transcriptional Network in Embryonic Stem Cells , 2008, Cell.

[7]  Terrence S. Furey,et al.  F-Seq: a feature density estimator for high-throughput sequence tags , 2008, Bioinform..

[8]  Steven J. M. Jones,et al.  FindPeaks 3.1: a tool for identifying areas of enrichment from massively parallel short-read sequencing technology , 2008, Bioinform..

[9]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[10]  B. Wold,et al.  Sequence census methods for functional genomics , 2008, Nature Methods.

[11]  Raja Jothi,et al.  Genome-wide identification of in vivo protein–DNA binding sites from ChIP-Seq data , 2008, Nucleic acids research.

[12]  Thomas Zeng,et al.  Global analysis of in vivo Foxa2-binding sites in mouse adult liver using massively parallel sequencing , 2008, Nucleic acids research.

[13]  S. Batzoglou,et al.  Genome-Wide Analysis of Transcription Factor Binding Sites Based on ChIP-Seq Data , 2008, Nature Methods.

[14]  Mark Gerstein,et al.  Modeling ChIP Sequencing In Silico with Applications , 2008, PLoS Comput. Biol..

[15]  P. Park,et al.  Design and analysis of ChIP-seq experiments for DNA-binding proteins , 2008, Nature Biotechnology.

[16]  Shane T. Jensen,et al.  Cis-regulatory modules in the mammalian liver: composition depends on strength of Foxa2 consensus site , 2008, Nucleic acids research.

[17]  A. Barski,et al.  Genomic location analysis by ChIP‐Seq , 2009, Journal of cellular biochemistry.

[18]  Raymond K. Auerbach,et al.  PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls , 2009, Nature Biotechnology.

[19]  M. Kimmel,et al.  Conflict of interest statement. None declared. , 2010 .