BIOINFORMATICS ORIGINAL PAPER TIP: A probabilistic method for identifying transcription factor target genes from ChIP-seq binding profiles

Motivation: ChIP-seq and ChIP-chip experiments have been widely used to identify transcription factor (TF) binding sites and target genes. Conventionally, a fairly ‘simple’ approach is employed for target gene identification e.g. finding genes with binding sites within 2kb of a transcription start site (TSS). However, this does not take into account the number of sites upstream of the TSS, their exact positioning or the fact that different TFs appear to act at different characteristic distances from the TSS. Results: Here we propose a probabilistic model called target identification from profiles (TIP) that quantitatively measures the regulatory relationships between TFs and target genes. For each TF, our model builds a characteristic, averaged profile of binding around the TSS and then uses this to weight the sites associated with a given gene, providing a continuous-valued ‘regulatory’ score relating each TF and potential target. Moreover, the score can readily be turned into a ranked list of target genes and an estimate of significance, which is useful for case-dependent downstream analysis. Conclusion: We show the advantages of TIP by comparing it to the ‘simple’ approach on several representative datasets, using motif occurrence and relationship to knock-out experiments as metrics of validation. Moreover, we show that the probabilistic model is not as sensitive to various experimental parameters (including sequencing depth and peak-calling method) as the simple approach; in fact, the lesser dependence on sequencing depth potentially utilizes the result of a ChIP-seq experiment in a more ‘cost-effective’ manner. Supplementary Supplementary data are available at Bioinformatics

[1]  J. Stamatoyannopoulos,et al.  Quantitative Models of the Mechanisms That Control Genome-Wide Patterns of Transcription Factor Binding during Early Drosophila Development , 2011, PLoS genetics.

[2]  Raymond K. Auerbach,et al.  Integrative Analysis of the Caenorhabditis elegans Genome by the modENCODE Project , 2010, Science.

[3]  Gos Micklem,et al.  Supporting Online Material Materials and Methods Figs. S1 to S50 Tables S1 to S18 References Identification of Functional Elements and Regulatory Circuits by Drosophila Modencode , 2022 .

[4]  Hans Clevers,et al.  Efficient Double Fragmentation ChIP-seq Provides Nucleotide Resolution Protein-DNA Binding Profiles , 2010, PloS one.

[5]  Stephen A. Ramsey,et al.  Genome-wide histone acetylation data improve prediction of mammalian transcription factor binding sites , 2010, Bioinform..

[6]  M. Facciotti,et al.  Evaluation of Algorithm Performance in ChIP-Seq Peak Detection , 2010, PloS one.

[7]  C. Zang,et al.  Discrete roles of STAT4 and STAT6 transcription factors in tuning epigenetic modifications and transcription during T helper cell differentiation. , 2010, Immunity.

[8]  B. Komm,et al.  Genome-Wide Analysis of Estrogen Receptor α DNA Binding and Tethering Mechanisms Identifies Runx1 as a Novel Tethering Factor in Receptor-Mediated Transcriptional Activation , 2010, Molecular and Cellular Biology.

[9]  Jason B. Ernst,et al.  Integrating multiple evidence sources to predict transcription factor binding in the human genome. , 2010, Genome research.

[10]  Anushya Muruganujan,et al.  PANTHER version 7: improved phylogenetic trees, orthologs and collaboration with the Gene Ontology Consortium , 2009, Nucleic Acids Res..

[11]  B. Ren,et al.  Genome-wide prediction of transcription factor binding sites using an integrated model , 2010, Genome Biology.

[12]  A. Mortazavi,et al.  Computation for ChIP-seq and RNA-seq studies , 2009, Nature Methods.

[13]  P. Park ChIP–seq: advantages and challenges of a maturing technology , 2009, Nature Reviews Genetics.

[14]  James B. Brown,et al.  Developmental roles of 21 Drosophila transcription factors are determined by quantitative differences in binding to an overlapping set of thousands of genomic regions , 2009, Genome Biology.

[15]  G. Tuteja,et al.  Extracting transcription factor targets from ChIP-Seq data , 2009, Nucleic acids research.

[16]  T. Bailey,et al.  High-throughput chromatin information enables accurate tissue-specific prediction of transcription factor binding sites , 2008, Nucleic acids research.

[17]  Raymond K. Auerbach,et al.  PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls , 2009, Nature Biotechnology.

[18]  David A. Nix,et al.  Empirical methods for controlling false positives and estimating confidence in ChIP-Seq peaks , 2008, BMC Bioinformatics.

[19]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[20]  Megan F. Cole,et al.  Connecting microRNA Genes to the Core Transcriptional Regulatory Circuitry of Embryonic Stem Cells , 2008, Cell.

[21]  Lucas D. Ward,et al.  Predicting functional transcription factor binding through alignment-free and affinity-based analysis of orthologous promoter sequences , 2008, ISMB.

[22]  N. D. Clarke,et al.  Integration of External Signaling Pathways with the Core Transcriptional Network in Embryonic Stem Cells , 2008, Cell.

[23]  S. Orkin,et al.  An Extended Transcriptional Network for Pluripotency of Embryonic Stem Cells , 2008, Cell.

[24]  D. W. Knowles,et al.  Transcription Factors Bind Thousands of Active and Inactive Regions in the Drosophila Blastoderm , 2008, PLoS biology.

[25]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[26]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[27]  Zhiping Weng,et al.  Statistical analysis of the genomic distribution and correlation of regulatory elements in the ENCODE regions. , 2007, Genome research.

[28]  Megan F. Cole,et al.  Core Transcriptional Regulatory Circuitry in Human Embryonic Stem Cells , 2005, Cell.

[29]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[30]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[32]  Holger Karas,et al.  TRANSFAC: a database on transcription factors and their DNA binding sites , 1996, Nucleic Acids Res..

[33]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.