Detecting clustering and ordering binding patterns among transcription factors via point process models

MOTIVATION Recent development in ChIP-Seq technology has generated binding data for many transcription factors (TFs) in various cell types and cellular conditions. This opens great opportunities for studying combinatorial binding patterns among a set of TFs active in a particular cellular condition, which is a key component for understanding the interaction between TFs in gene regulation. RESULTS As a first step to the identification of combinatorial binding patterns, we develop statistical methods to detect clustering and ordering patterns among binding sites (BSs) of a pair of TFs. Testing procedures based on Ripley's K-function and its generalizations are developed to identify binding patterns from large collections of BSs in ChIP-Seq data. We have applied our methods to the ChIP-Seq data of 91 pairs of TFs in mouse embryonic stem cells. Our methods have detected clustering binding patterns between most TF pairs, which is consistent with the findings in the literature, and have identified significant ordering preferences, relative to the direction of target gene transcription, among the BSs of seven TFs. More interestingly, our results demonstrate that the identified clustering and ordering binding patterns between TFs are associated with the expression of the target genes. These findings provide new insights into co-regulation between TFs. AVAILABILITY AND IMPLEMENTATION See 'www.stat.ucla.edu/∼zhou/TFKFunctions/' for source code.

[1]  W. Wong,et al.  CisModule: de novo discovery of cis-regulatory modules by hierarchical mixture modeling. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Ikuho Yamada,et al.  An Empirical Comparison of Edge Effect Correction Methods Applied to K -function Analysis , 2003 .

[3]  Mike J. Mason,et al.  Role of the Murine Reprogramming Factors in the Induction of Pluripotency , 2009, Cell.

[4]  Jun S. Liu,et al.  De novo cis-regulatory module elicitation for eukaryotic genomes. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Timothy L. Bailey,et al.  Gene expression Advance Access publication May 4, 2011 DREME: motif discovery in transcription factor ChIP-seq data , 2011 .

[6]  Hongkai Ji,et al.  A comparative analysis of genome-wide chromatin immunoprecipitation data for mammalian transcription factors , 2006, Nucleic acids research.

[7]  W. Wong,et al.  ChIP-Seq of transcription factors predicts absolute and differential gene expression in embryonic stem cells , 2009, Proceedings of the National Academy of Sciences.

[8]  Yuriy L Orlov,et al.  The nuclear receptor Nr5a2 can replace Oct4 in the reprogramming of murine somatic cells to pluripotent cells. , 2010, Cell stem cell.

[9]  P. Dixon Ripley's K Function , 2006 .

[10]  G. Stormo,et al.  Identification of a novel cis-regulatory element involved in the heat shock response in Caenorhabditis elegans using microarray gene expression and computational methods. , 2002, Genome research.

[11]  S. Orkin,et al.  An Extended Transcriptional Network for Pluripotency of Embryonic Stem Cells , 2008, Cell.

[12]  H. Schöler,et al.  Stem cell pluripotency and transcription factor Oct4 , 2002, Cell Research.

[13]  R. Myers,et al.  An Integrated Software System for Analyzing Chip-chip and Chip-seq Data (supplementary Information) , 2008 .

[14]  Qing Zhou,et al.  Identification of Context-Dependent Motifs by Contrasting ChIP Binding Data , 2010, Bioinform..

[15]  Radu Dobrin,et al.  Dissecting self-renewal in stem cells with RNA interference , 2006, Nature.

[16]  A. McMahon,et al.  Gene Regulatory Networks Mediating Canonical Wnt Signal‐Directed Control of Pluripotency and Differentiation in Embryo Stem Cells , 2013, Stem cells.

[17]  J. Thomson,et al.  Embryonic stem cell lines derived from human blastocysts. , 1998, Science.

[18]  Marc S Halfon,et al.  Computation-based discovery of related transcriptional regulatory modules and motifs using an experimentally validated combinatorial model. , 2002, Genome research.

[19]  J. van Helden,et al.  RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets , 2011, Nucleic acids research.

[20]  Z. Weng,et al.  Statistical significance of clusters of motifs represented by position specific scoring matrices in nucleotide sequences. , 2002, Nucleic acids research.

[21]  B. Ripley The Second-Order Analysis of Stationary Point Processes , 1976 .

[22]  D. Kaufman,et al.  Multilineage Differentiation from Human Embryonic Stem Cell Lines , 2001, Stem cells.

[23]  James Taylor,et al.  Genomic approaches towards finding cis-regulatory modules in animals , 2012, Nature Reviews Genetics.

[24]  Saurabh Sinha,et al.  A Biophysical Model for Analysis of Transcription Factor Interaction and Binding Site Arrangement from Genome-Wide Binding Data , 2009, PloS one.

[25]  S. Yamanaka,et al.  Induction of Pluripotent Stem Cells from Mouse Embryonic and Adult Fibroblast Cultures by Defined Factors , 2006, Cell.

[26]  Alexander E. Kel,et al.  TRANSFAC®: transcriptional regulation, from patterns to profiles , 2003, Nucleic Acids Res..

[27]  G. Rubin,et al.  Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Qing Zhou,et al.  Searching ChIP-seq genomic islands for combinatorial regulatory codes in mouse embryonic stem cells , 2011, BMC Genomics.

[29]  Guangjin Pan,et al.  Nanog and transcriptional networks in embryonic stem cell pluripotency , 2007, Cell Research.

[30]  Megan F. Cole,et al.  Connecting microRNA Genes to the Core Transcriptional Regulatory Circuitry of Embryonic Stem Cells , 2008, Cell.

[31]  Peter W. Markstein,et al.  Genome-wide analysis of clustered Dorsal binding sites identifies putative target genes in the Drosophila embryo , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[32]  W. Wong,et al.  A gene regulatory network in mouse embryonic stem cells , 2007, Proceedings of the National Academy of Sciences.

[33]  Michael Y. Galperin,et al.  The 2012 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection , 2011, Nucleic Acids Res..

[34]  N. D. Clarke,et al.  Integration of External Signaling Pathways with the Core Transcriptional Network in Embryonic Stem Cells , 2008, Cell.

[35]  Mark Rebeiz,et al.  SCORE: A computational approach to the identification of cis-regulatory modules and target genes in whole-genome sequence data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Qing Zhou,et al.  Co-regulation in embryonic stem cells via context-dependent binding of transcription factors , 2013, Bioinform..

[37]  Yuriy L. Orlov,et al.  Genome-wide statistical analysis of multiple transcription factor binding sites obtained by chip-seq technologies , 2009, CompBio '09.

[38]  A. Baddeley,et al.  Non‐ and semi‐parametric estimation of interaction in inhomogeneous point patterns , 2000 .

[39]  Saurabh Sinha,et al.  Program in Gene Function and Expression Publications and Presentations Program in Gene Function and Expression 9-2013 Widespread evidence of cooperative DNA binding by transcription factors in Drosophila development , 2014 .

[40]  Jean-Stéphane Varré,et al.  Efficient and accurate P-value computation for Position Weight Matrices , 2007, Algorithms for Molecular Biology.

[41]  W. Wong,et al.  Coupling Hidden Markov Models for the Discovery of Cis-Regulatory Modules in Multiple Species , 2007, 0708.4318.