Discovering transcription factor regulatory targets using gene expression and binding data

MOTIVATION Identifying the target genes regulated by transcription factors (TFs) is the most basic step in understanding gene regulation. Recent advances in high-throughput sequencing technology, together with chromatin immunoprecipitation (ChIP), enable mapping TF binding sites genome wide, but it is not possible to infer function from binding alone. This is especially true in mammalian systems, where regulation often occurs through long-range enhancers in gene-rich neighborhoods, rather than proximal promoters, preventing straightforward assignment of a binding site to a target gene. RESULTS We present EMBER (Expectation Maximization of Binding and Expression pRofiles), a method that integrates high-throughput binding data (e.g. ChIP-chip or ChIP-seq) with gene expression data (e.g. DNA microarray) via an unsupervised machine learning algorithm for inferring the gene targets of sets of TF binding sites. Genes selected are those that match overrepresented expression patterns, which can be used to provide information about multiple TF regulatory modes. We apply the method to genome-wide human breast cancer data and demonstrate that EMBER confirms a role for the TFs estrogen receptor alpha, retinoic acid receptors alpha and gamma in breast cancer development, whereas the conventional approach of assigning regulatory targets based on proximity does not. Additionally, we compare several predicted target genes from EMBER to interactions inferred previously, examine combinatorial effects of TFs on gene regulation and illustrate the ability of EMBER to discover multiple modes of regulation. AVAILABILITY All code used for this work is available at http://dinner-group.uchicago.edu/downloads.html.

[1]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[2]  K. White,et al.  Genomic Antagonism between Retinoic Acid and Estrogen Signaling in Breast Cancer , 2009, Cell.

[3]  J. Dekker,et al.  Capturing Chromosome Conformation , 2002, Science.

[4]  Timothy J. Durham,et al.  "Systematic" , 1966, Comput. J..

[5]  A. Mortazavi,et al.  Computation for ChIP-seq and RNA-seq studies , 2009, Nature Methods.

[6]  Eric S. Lander,et al.  Genomic Maps and Comparative Analysis of Histone Modifications in Human and Mouse , 2005, Cell.

[7]  Ernest Fraenkel,et al.  Insights into GATA-1-mediated gene activation versus repression via genome-wide chromatin occupancy analysis. , 2009, Molecular cell.

[8]  H. L. Le Roy,et al.  Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Vol. IV , 1969 .

[9]  Qun Zhou,et al.  Inhibition of cyclin D expression in human breast carcinoma cells by retinoids in vitro , 1997, Oncogene.

[10]  S. Cawley,et al.  Unbiased Mapping of Transcription Factor Binding Sites along Human Chromosomes 21 and 22 Points to Widespread Regulation of Noncoding RNAs , 2004, Cell.

[11]  T. Mikkelsen,et al.  Genome-wide maps of chromatin state in pluripotent and lineage-committed cells , 2007, Nature.

[12]  R. Flavell,et al.  Hypersensitive site 7 of the TH2 locus control region is essential for expressing TH2 cytokine genes and for long-range intrachromosomal interactions , 2005, Nature Immunology.

[13]  Cory Y. McLean,et al.  GREAT improves functional interpretation of cis-regulatory regions , 2010, Nature Biotechnology.

[14]  Neil D. Lawrence,et al.  A probabilistic dynamical model for quantitative inference of the regulatory mechanism of transcription , 2006, Bioinform..

[15]  John Calvin Reed,et al.  Interaction of BAG-1 with Retinoic Acid Receptor and Its Inhibition of Retinoic Acid-induced Apoptosis in Cancer Cells* , 1998, The Journal of Biological Chemistry.

[16]  M. Berger,et al.  Protein binding microarrays (PBMs) for rapid, high-throughput characterization of the sequence specificities of DNA binding proteins. , 2006, Methods in molecular biology.

[17]  A. Dinner,et al.  Epigenetic repression of the Igk locus by STAT5-mediated recruitment of the histone methyltransferase Ezh2 , 2011, Nature Immunology.

[18]  Nicola J. Rinaldi,et al.  Computational discovery of gene modules and regulatory networks , 2003, Nature Biotechnology.

[19]  Feng Gao,et al.  Defining transcriptional networks through integrative modeling of mRNA expression and transcription factor binding data , 2004, BMC Bioinformatics.

[20]  David N Arnosti,et al.  Transcriptional enhancers: Intelligent enhanceosomes or flexible billboards? , 2005, Journal of cellular biochemistry.

[21]  T. Ley,et al.  Long-range disruption of gene expression by a selectable marker cassette. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[22]  J. Banerji,et al.  Expression of a β-globin gene is enhanced by remote SV40 DNA sequences , 1981, Cell.

[23]  N. Friedman,et al.  Structure and function of a transcriptional network activated by the MAPK Hog1 , 2008, Nature Genetics.

[24]  R. Young,et al.  A Chromatin Landmark and Transcription Initiation at Most Promoters in Human Cells , 2007, Cell.

[25]  D. Hartl,et al.  Bayesian analysis of gene expression levels: statistical quantification of relative mRNA level across multiple strains or treatments , 2002, Genome Biology.

[26]  S. Batzoglou,et al.  Genome-Wide Analysis of Transcription Factor Binding Sites Based on ChIP-Seq Data , 2008, Nature Methods.

[27]  J. Dekker,et al.  Mapping networks of physical interactions between genomic elements using 5C technology , 2007, Nature Protocols.

[28]  Timothy J. Durham,et al.  Systematic analysis of chromatin state dynamics in nine human cell types , 2011, Nature.

[29]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[30]  Leighton J. Core,et al.  A Rapid, Extensive, and Transient Transcriptional Response to Estrogen Signaling in Breast Cancer Cells , 2011, Cell.

[31]  Henriette O'Geen,et al.  Discovering hematopoietic mechanisms through genome-wide analysis of GATA factor chromatin occupancy. , 2009, Molecular cell.

[32]  Neil D. Lawrence,et al.  Probabilistic inference of transcription factor concentrations and gene-specific regulatory activities , 2006, Bioinform..

[33]  Cheng Li,et al.  DNA-Chip Analyzer (dChip) , 2003 .

[34]  Allen D. Delaney,et al.  Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing , 2007, Nature Methods.

[35]  J. Yager,et al.  Estrogen carcinogenesis in breast cancer. , 2006, The New England journal of medicine.

[36]  E. Bresnick,et al.  Dual promoter activation by the human beta-globin locus control region. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Hao Wu,et al.  MAANOVA: A Software Package for the Analysis of Spotted cDNA Microarray Experiments , 2003 .

[38]  M Karplus,et al.  Evolutionary optimization in quantitative structure-activity relationship: an application of genetic neural networks. , 1996, Journal of medicinal chemistry.

[39]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[40]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Dustin E. Schones,et al.  High-Resolution Profiling of Histone Methylations in the Human Genome , 2007, Cell.

[42]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[43]  M. Ptashne A genetic switch : phage λ and higher organisms , 1992 .

[44]  Nathaniel D. Heintzman,et al.  Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome , 2007, Nature Genetics.

[45]  G. Parmigiani,et al.  The Analysis of Gene Expression Data , 2003 .

[46]  A. Boulesteix,et al.  Predicting transcription factor activities from combined analysis of microarray and ChIP data: a partial least squares approach , 2005, Theoretical Biology and Medical Modelling.

[47]  Clifford A. Meyer,et al.  Differentiation-specific histone modifications reveal dynamic chromatin interactions and partners for the intestinal transcription factor CDX2. , 2010, Developmental cell.

[48]  Parantu K. Shah,et al.  Genomic analysis of estrogen cascade reveals histone variant H2A.Z associated with breast cancer progression , 2008, Molecular systems biology.

[49]  Aviv Regev,et al.  Transcriptional Regulatory Circuits: Predicting Numbers from Alphabets , 2009, Science.

[50]  Gordon K Smyth,et al.  Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2004, Statistical applications in genetics and molecular biology.

[51]  Nathaniel D. Heintzman,et al.  Histone modifications at human enhancers reflect global cell-type-specific gene expression , 2009, Nature.

[52]  Kathleen A. Kennedy,et al.  Systems biology approaches identify ATF3 as a negative regulator of Toll-like receptor 4 , 2006, Nature.