Assessing Computational Methods for Transcription Factor Target Gene Identification Based on ChIP-seq Data

Chromatin immunoprecipitation coupled with deep sequencing (ChIP-seq) has great potential for elucidating transcriptional networks, by measuring genome-wide binding of transcription factors (TFs) at high resolution. Despite the precision of these experiments, identification of genes directly regulated by a TF (target genes) is not trivial. Numerous target gene scoring methods have been used in the past. However, their suitability for the task and their performance remain unclear, because a thorough comparative assessment of these methods is still lacking. Here we present a systematic evaluation of computational methods for defining TF targets based on ChIP-seq data. We validated predictions based on 68 ChIP-seq studies using a wide range of genomic expression data and functional information. We demonstrate that peak-to-gene assignment is the most crucial step for correct target gene prediction and propose a parameter-free method performing most consistently across the evaluation tests.

[1]  C. Nusbaum,et al.  Chromosome Conformation Capture Carbon Copy (5C): a massively parallel solution for mapping interactions between genomic elements. , 2006, Genome research.

[2]  Rainer Breitling,et al.  Expression Quantitative Trait Loci Are Highly Sensitive to Cellular Differentiation State , 2009, PLoS genetics.

[3]  David G. Schatz,et al.  The In Vivo Pattern of Binding of RAG1 and RAG2 to Antigen Receptor Loci , 2010, Cell.

[4]  A. West,et al.  The Protein CTCF Is Required for the Enhancer Blocking Activity of Vertebrate Insulators , 1999, Cell.

[5]  Mike J. Mason,et al.  Role of the Murine Reprogramming Factors in the Induction of Pluripotency , 2009, Cell.

[6]  A. Vostrov,et al.  The zinc finger protein CTCF binds to the APBbeta domain of the amyloid beta-protein precursor promoter. Evidence for a role in transcriptional activation. , 1997, The Journal of biological chemistry.

[7]  R. Myers,et al.  Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data , 2005, Nucleic acids research.

[8]  C. Massie,et al.  ChIPping away at gene regulation , 2008, EMBO reports.

[9]  Bart De Moor,et al.  BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis , 2005, Bioinform..

[10]  W. Wong,et al.  ChIP-Seq of transcription factors predicts absolute and differential gene expression in embryonic stem cells , 2009, Proceedings of the National Academy of Sciences.

[11]  S. Batzoglou,et al.  Genome-Wide Analysis of Transcription Factor Binding Sites Based on ChIP-Seq Data , 2008, Nature Methods.

[12]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[13]  Zhiping Weng,et al.  Global mapping of c-Myc binding sites and target gene networks in human B cells , 2006, Proceedings of the National Academy of Sciences.

[14]  X. Chen,et al.  The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells , 2006, Nature Genetics.

[15]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[16]  Victor X Jin,et al.  A comprehensive ChIP-chip analysis of E2F1, E2F4, and E2F6 in normal and tumor cells reveals interchangeable roles of E2F family members. , 2007, Genome research.

[17]  Jie Zhou,et al.  Discovering transcription factor regulatory targets using gene expression and binding data , 2012, Bioinform..

[18]  M. Bodén,et al.  Associating transcription factor-binding site motifs with target GO terms and target genes , 2008, Nucleic acids research.

[19]  Victor V Lobanenkov,et al.  Negative protein 1, which is required for function of the chicken lysozyme gene silencer in conjunction with hormone receptors, is identical to the multivalent zinc finger repressor CTCF , 1997, Molecular and cellular biology.

[20]  Wei Niu,et al.  Construction and Analysis of an Integrated Regulatory Network Derived from High-Throughput Sequencing Data , 2011, PLoS Comput. Biol..

[21]  A. Stark,et al.  Uncovering cis-regulatory sequence requirements for context-specific transcription factor binding , 2012, Genome research.

[22]  Martin Vingron,et al.  Combinatorial Binding in Human and Mouse Embryonic Stem Cells Identifies Conserved Enhancers Active in Early Embryonic Development , 2011, PLoS Comput. Biol..

[23]  Ernest Fraenkel,et al.  A Quantitative Model of Transcriptional Regulation Reveals the Influence of Binding Location on Expression , 2010, PLoS Comput. Biol..

[24]  P. Park ChIP–seq: advantages and challenges of a maturing technology , 2009, Nature Reviews Genetics.

[25]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[26]  A. Mortazavi,et al.  Computation for ChIP-seq and RNA-seq studies , 2009, Nature Methods.

[27]  R. Janknecht,et al.  Activation of Smad1-mediated transcription by p300/CBP. , 1999, Biochimica et biophysica acta.

[28]  Rainer Breitling,et al.  Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments , 2004, FEBS letters.

[29]  Thomas Lengauer,et al.  Improved scoring of functional groups from gene expression data by decorrelating GO graph structure , 2006, Bioinform..

[30]  Victor V Lobanenkov,et al.  A novel sequence-specific DNA binding protein which interacts with three regularly spaced direct repeats of the CCCTC-motif in the 5'-flanking sequence of the chicken c-myc gene. , 1990, Oncogene.

[31]  Srinivasan Parthasarathy,et al.  Predicting functionality of protein–DNA interactions by integrating diverse evidence , 2009, Bioinform..

[32]  Anagha Joshi,et al.  A compendium of genome-wide hematopoietic transcription factor maps supports the identification of gene regulatory control mechanisms. , 2011, Experimental hematology.

[33]  Christopher B. Burge,et al.  c-Myc Regulates Transcriptional Pause Release , 2010, Cell.

[34]  P. Bickel,et al.  Systematic evaluation of factors influencing ChIP-seq fidelity , 2012, Nature Methods.

[35]  E. Liu,et al.  An Oestrogen Receptor α-bound Human Chromatin Interactome , 2009, Nature.

[36]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[37]  Peter J. van der Spek,et al.  TF Target Mapper: A BLAST search tool for the identification of Transcription Factor target genes , 2006, BMC Bioinformatics.

[38]  M. Facciotti,et al.  Evaluation of Algorithm Performance in ChIP-Seq Peak Detection , 2010, PloS one.

[39]  Robert Clarke,et al.  Network motif-based identification of transcription factor-target gene relationships by integrating multi-source biological data , 2008, BMC Bioinformatics.

[40]  Martin Vingron,et al.  Prioritization of gene regulatory interactions from large-scale modules in yeast , 2008, BMC Bioinformatics.

[41]  David S. Lapointe,et al.  ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data , 2010, BMC Bioinformatics.

[42]  Trey Ideker,et al.  Integrated Assessment and Prediction of Transcription Factor Binding , 2006, PLoS Comput. Biol..

[43]  Mark Gerstein,et al.  TIP: A probabilistic method for identifying transcription factor target genes from ChIP-seq binding profiles , 2011, Bioinform..

[44]  Cheng Cheng,et al.  ChIP-PaM: an algorithm to identify protein-DNA interaction using ChIP-Seq data , 2010, Theoretical Biology and Medical Modelling.

[45]  Michael Q. Zhang,et al.  Analysis of the Vertebrate Insulator Protein CTCF-Binding Sites in the Human Genome , 2007, Cell.

[46]  Nicola J. Rinaldi,et al.  Computational discovery of gene modules and regulatory networks , 2003, Nature Biotechnology.

[47]  N. D. Clarke,et al.  Integration of External Signaling Pathways with the Core Transcriptional Network in Embryonic Stem Cells , 2008, Cell.