Identification of Context-Dependent Motifs by Contrasting ChIP Binding Data

MOTIVATION DNA binding proteins play crucial roles in the regulation of gene expression. Transcription factors (TFs) activate or repress genes directly while other proteins influence chromatin structure for transcription. Binding sites of a TF exhibit a similar sequence pattern called a motif. However, a one-to-one map does not exist between each TF and motif. Many TFs in a protein family may recognize the same motif with subtle nucleotide differences leading to different binding affinities. Additionally, a particular TF may bind different motifs under certain conditions, for example in the presence of different co-regulators. The availability of genome-wide binding data of multiple collaborative TFs makes it possible to detect such context-dependent motifs. RESULTS We developed a contrast motif finder (CMF) for the de novo identification of motifs that are differentially enriched in two sets of sequences. Applying this method to a number of TF binding datasets from mouse embryonic stem cells, we demonstrate that CMF achieves substantially higher accuracy than several well-known motif finding methods. By contrasting sequences bound by distinct sets of TFs, CMF identified two different motifs that may be recognized by Oct4 dependent on the presence of another co-regulator and detected subtle motif signals that may be associated with potential competitive binding between Sox2 and Tcf3. AVAILABILITY The software CMF is freely available for academic use at www.stat.ucla.edu/∼zhou/CMF.

[1]  A. Zorn,et al.  Interactions between SOX factors and Wnt/β‐catenin signaling in development and disease , 2009, Developmental dynamics : an official publication of the American Association of Anatomists.

[2]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[3]  Francis Y. L. Chin,et al.  Finding motifs from all sequences with and without binding sites , 2006, Bioinform..

[4]  Xueping Yu,et al.  Genome-wide prediction and characterization of interactions between transcription factors in Saccharomyces cerevisiae , 2006, Nucleic acids research.

[5]  G. Church,et al.  Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation , 1998, Nature Biotechnology.

[6]  Megan F. Cole,et al.  Connecting microRNA Genes to the Core Transcriptional Regulatory Circuitry of Embryonic Stem Cells , 2008, Cell.

[7]  W. Fairbrother,et al.  High-throughput biochemical analysis of in vivo location data reveals novel distinct classes of POU5F1(Oct4)/DNA complexes. , 2008, Genome research.

[8]  G. Stormo,et al.  Identifying protein-binding sites from unaligned DNA fragments. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[9]  G. K. Sandve,et al.  A survey of motif discovery methods in an integrated framework , 2006, Biology Direct.

[10]  N. Slonim,et al.  A universal framework for regulatory element discovery across all genomes and data types. , 2007, Molecular cell.

[11]  Mike J. Mason,et al.  Role of the Murine Reprogramming Factors in the Induction of Pluripotency , 2009, Cell.

[12]  Michael A. Beer,et al.  Predicting Gene Expression from Sequence , 2004, Cell.

[13]  W. Wong,et al.  ChIP-Seq of transcription factors predicts absolute and differential gene expression in embryonic stem cells , 2009, Proceedings of the National Academy of Sciences.

[14]  S. Orkin,et al.  An Extended Transcriptional Network for Pluripotency of Embryonic Stem Cells , 2008, Cell.

[15]  W. Wong,et al.  CisModule: de novo discovery of cis-regulatory modules by hierarchical mixture modeling. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[17]  Fei Yi,et al.  Tcf3 Functions as a Steady‐State Limiter of Transcriptional Programs of Mouse Embryonic Stem Cell Self‐Renewal , 2008, Stem cells.

[18]  Douglas L. Brutlag,et al.  BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes , 2000, Pacific Symposium on Biocomputing.

[19]  Jun S. Liu,et al.  An algorithm for finding protein–DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments , 2002, Nature Biotechnology.

[20]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[21]  Richard A Young,et al.  Tcf3 is an integral component of the core regulatory circuitry of embryonic stem cells. , 2008, Genes & development.

[22]  Raymond C Stevens,et al.  Crystal structure and DNA binding of the homeodomain of the stem cell transcription factor Nanog. , 2008, Journal of molecular biology.

[23]  Timothy L. Bailey,et al.  Discriminative motif discovery in DNA and protein sequences using the DEME algorithm , 2007, BMC Bioinformatics.

[24]  Matthias Wilmanns,et al.  Synergism with the Coactivator OBF-1 (OCA-B, BOB-1) Is Mediated by a Specific POU Dimer Configuration , 2000, Cell.

[25]  Martin Vingron,et al.  Integrating sequence, evolution and functional genomics in regulatory genomics , 2009, Genome Biology.

[26]  Michael Q. Zhang,et al.  Identifying tissue-selective transcription factor binding sites in vertebrate promoters. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Nir Friedman,et al.  A Simple Hyper-Geometric Approach for Discovering Putative Transcription Factor Binding Sites , 2001, WABI.

[28]  Jan Komorowski,et al.  Genome-scale study of the importance of binding site context for transcription factor binding and gene regulation , 2008, BMC Bioinformatics.

[29]  Shinya Yamanaka,et al.  Differential Roles for Sox15 and Sox2 in Transcriptional Control in Mouse Embryonic Stem Cells* , 2005, Journal of Biological Chemistry.

[30]  H. Clevers,et al.  Sox‐4, an Sry‐like HMG box protein, is a transcriptional activator in lymphocytes. , 1993, The EMBO journal.

[31]  Gong Chen,et al.  Heterogeneity in DNA Multiple Alignments: Modeling, Inference, and Applications in Motif Finding , 2010, Biometrics.

[32]  Jun S. Liu,et al.  Bayesian Models for Multiple Local Sequence Alignment and Gibbs Sampling Strategies , 1995 .

[33]  Wing Hung Wong,et al.  Determination of Local Statistical Significance of Patterns in Markov Sequences with Application to Promoter Element Identification , 2004, J. Comput. Biol..

[34]  Michael T. McManus,et al.  Chd1 regulates open chromatin and pluripotency of embryonic stem cells , 2009, Nature.

[35]  N. D. Clarke,et al.  Integration of External Signaling Pathways with the Core Transcriptional Network in Embryonic Stem Cells , 2008, Cell.

[36]  Matthias Wilmanns,et al.  Crystal structure of a POU/HMG/DNA ternary complex suggests differential assembly of Oct4 and Sox2 on two enhancers. , 2003, Genes & development.

[37]  Shulan Tian,et al.  Induced Pluripotent Stem Cell Lines Derived from Human Somatic Cells , 2007, Science.

[38]  A. Sharov,et al.  Exhaustive Search for Over-represented DNA Sequence Motifs with CisFinder , 2009, DNA research : an international journal for rapid publication of reports on genes and genomes.

[39]  Dat H. Nguyen,et al.  Deciphering principles of transcription regulation in eukaryotic genomes , 2006, Molecular systems biology.