Mining ChIP-chip data for transcription factor and cofactor binding sites

MOTIVATION Identification of single motifs and motif pairs that can be used to predict transcription factor localization in ChIP-chip data, and gene expression in tissue-specific microarray data. RESULTS We describe methodology to identify de novo individual and interacting pairs of binding site motifs from ChIP-chip data, using an algorithm that integrates localization data directly into the motif discovery process. We combine matrix-enumeration based motif discovery with multivariate regression to evaluate candidate motifs and identify motif interactions. When applied to the HNF localization data in liver and pancreatic islets, our methods produce motifs that are either novel or improved known motifs. All motif pairs identified to predict localization are further evaluated according to how well they predict expression in liver and islets and according to how conserved are the relative positions of their occurrences. We find that interaction models of HNF1 and CDP motifs provide excellent prediction of both HNF1 localization and gene expression in liver. Our results demonstrate that ChIP-chip data can be used to identify interacting binding site motifs. AVAILABILITY Motif discovery programs and analysis tools are available on request from the authors.

[1]  Michael Q. Zhang,et al.  Identifying tissue-selective transcription factor binding sites in vertebrate promoters. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Michael Q. Zhang,et al.  Interacting models of cooperative gene regulation. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[3]  S. Batalov,et al.  A gene atlas of the mouse and human protein-encoding transcriptomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Eleazar Eskin,et al.  From profiles to patterns and back again: a branch and bound algorithm for finding near optimal motif profiles , 2004, RECOMB.

[5]  Nicola J. Rinaldi,et al.  Control of Pancreas and Liver Gene Expression by HNF Transcription Factors , 2004, Science.

[6]  Bing Ren,et al.  Use of chromatin immunoprecipitation assays in genome-wide location analysis of mammalian transcription factors. , 2004, Methods in enzymology.

[7]  S. Henikoff,et al.  Distinct HP1 and Su(var)3-9 complexes bind to sets of developmentally coexpressed genes depending on chromosomal location. , 2003, Genes & development.

[8]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[9]  Eric C. Rouchka,et al.  Gibbs Recursive Sampler: finding transcription factor binding sites , 2003, Nucleic Acids Res..

[10]  Jun S. Liu,et al.  Integrating regulatory motif discovery and genome-wide expression analysis , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Alexander E. Kel,et al.  TRANSFAC®: transcriptional regulation, from patterns to profiles , 2003, Nucleic Acids Res..

[12]  M. Kitagawa,et al.  The Role of Sp1 and AP-2 in Basal and Protein Kinase A-induced Expression of Mitochondrial Serine:Pyruvate Aminotransferase in Hepatocytes* , 2002, The Journal of Biological Chemistry.

[13]  S. Levy,et al.  Predicting transcription factor synergism. , 2002, Nucleic acids research.

[14]  D. S. St. Clair,et al.  Transcriptional regulation of the human manganese superoxide dismutase gene: the role of specificity protein 1 (Sp1) and activating protein-2 (AP-2). , 2002, The Biochemical journal.

[15]  W. Wasserman,et al.  A predictive model for regulatory sequences directing liver-specific transcription. , 2001, Genome research.

[16]  Gary D. Stormo,et al.  Identifying target sites for cooperatively binding factors , 2001, Bioinform..

[17]  Jeremy Buhler,et al.  Finding motifs using random projections , 2001, RECOMB.

[18]  H. Bussemaker,et al.  Regulatory element detection using correlation with expression , 2001, Nature Genetics.

[19]  Douglas L. Brutlag,et al.  BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes , 2000, Pacific Symposium on Biocomputing.

[20]  M. Hanlon,et al.  C/EBPBeta and Elk-1 synergistically transactivate the c-fos serum response element , 2000, BMC Cell Biology.

[21]  T. Antes,et al.  The Nuclear Matrix Protein CDP Represses Hepatic Transcription of the Human Cholesterol-7α Hydroxylase Gene* , 2000, The Journal of Biological Chemistry.

[22]  Pavel A. Pevzner,et al.  Combinatorial Approaches to Finding Subtle Signals in DNA Sequences , 2000, ISMB.

[23]  P. Mackenzie,et al.  Octamer transcription factor-1 enhances hepatic nuclear factor-1alpha-mediated activation of the human UDP glucuronosyltransferase 2B7 promoter. , 2000, Molecular pharmacology.

[24]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[25]  R. Taub,et al.  Transcriptional Up-regulation of the Delayed Early GeneHRS/SRp40 during Liver Regeneration , 1998, The Journal of Biological Chemistry.

[26]  R Taub,et al.  Transcriptional up-regulation of the delayed early gene HRS/SRp40 during liver regeneration. Interactions among YY1, GA-binding proteins, and mitogenic signals. , 1998, The Journal of biological chemistry.

[27]  J. Fickett,et al.  Identification of regulatory regions which confer muscle-specific gene expression. , 1998, Journal of molecular biology.

[28]  M. Carey,et al.  The Enhanceosome and Transcriptional Synergy , 1998, Cell.

[29]  I. Talianidis,et al.  Modulation of hepatic gene expression by hepatocyte nuclear factor 1. , 1997, Science.

[30]  M. Blumenfeld,et al.  Analysis of the distribution of binding sites for a tissue-specific transcription factor in the vertebrate genome. , 1997, Journal of molecular biology.

[31]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[32]  J. Fickett Coordinate positioning of MEF2 and myogenin binding sites. , 1996, Gene.

[33]  Jun S. Liu,et al.  Bayesian Models for Multiple Local Sequence Alignment and Gibbs Sampling Strategies , 1995 .

[34]  G. Brownlee,et al.  cis-Acting Elements and Transcription Factors Involved in the Promoter Activity of the Human Factor VIII Gene (*) , 1995, The Journal of Biological Chemistry.

[35]  D. R. Wilson,et al.  The transcription factor HNF1 acts with C/EBP alpha to synergistically activate the human albumin promoter through a novel domain. , 1994, The Journal of biological chemistry.

[36]  V. Mahdavi,et al.  A new bipartite DNA-binding domain: cooperative interaction between the cut repeat and homeo domain of the cut homeo proteins. , 1994, Genes & development.

[37]  S. Karlin,et al.  Chance and statistical significance in protein and DNA sequence analysis. , 1992, Science.

[38]  A. Kahn,et al.  Interplay of an original combination of factors: C/EBP, NFY, HNF3, and HNF1 in the rat aldolase B gene promoter. , 1991, Nucleic acids research.

[39]  J. Friedman Multivariate adaptive regression splines , 1990 .

[40]  T. Yen,et al.  The ubiquitous transcription factor Oct-1 and the liver-specific factor HNF-1 are both required to activate transcription of a hepatitis B virus promoter , 1991, Molecular and cellular biology.

[41]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .