Understanding transcriptional regulation by integrative analysis of transcription factor binding data

Statistical models have been used to quantify the relationship between gene expression and transcription factor (TF) binding signals. Here we apply the models to the large-scale data generated by the ENCODE project to study transcriptional regulation by TFs. Our results reveal a notable difference in the prediction accuracy of expression levels of transcription start sites (TSSs) captured by different technologies and RNA extraction protocols. In general, the expression levels of TSSs with high CpG content are more predictable than those with low CpG content. For genes with alternative TSSs, the expression levels of downstream TSSs are more predictable than those of the upstream ones. Different TF categories and specific TFs vary substantially in their contributions to predicting expression. Between two cell lines, the differential expression of TSS can be precisely reflected by the difference of TF-binding signals in a quantitative manner, arguing against the conventional on-and-off model of TF binding. Finally, we explore the relationships between TF-binding signals and other chromatin features such as histone modifications and DNase hypersensitivity for determining expression. The models imply that these features regulate transcription in a highly coordinated manner.

[1]  Michael B. Eisen,et al.  Control of Embryonic Stem Cell Lineage Commitment by Core Promoter Factor, TAF3 , 2011, Cell.

[2]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[3]  Ronald W. Davis,et al.  A high-resolution atlas of nucleosome occupancy in yeast , 2007, Nature Genetics.

[4]  M. Rudnicki,et al.  Faculty Opinions recommendation of Genome-wide protein-DNA binding dynamics suggest a molecular clutch for transcription factor function. , 2012 .

[5]  M. Gerstein,et al.  Structure and evolution of transcriptional regulatory networks. , 2004, Current opinion in structural biology.

[6]  E. Flemington,et al.  CpG methylation as a mechanism for the regulation of E2F activity. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Z. Weng,et al.  Sequence features that drive human promoter function and tissue specificity. , 2010, Genome research.

[8]  R. Tjian,et al.  Transcriptional regulation in mammalian cells by sequence-specific DNA binding proteins. , 1989, Science.

[9]  Mark Gerstein,et al.  TIP: A probabilistic method for identifying transcription factor target genes from ChIP-seq binding profiles , 2011, Bioinform..

[10]  D. Brutlag,et al.  A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[11]  T. Kouzarides Chromatin Modifications and Their Function , 2007, Cell.

[12]  Saeed Tavazoie,et al.  Mapping Global Histone Acetylation Patterns to Gene Expression , 2004, Cell.

[13]  Kevin Y. Yip,et al.  A statistical framework for modeling gene expression using chromatin features and application to modENCODE datasets , 2011, Genome Biology.

[14]  James B. Brown,et al.  Modeling gene expression using chromatin features in various cellular contexts , 2012, Genome Biology.

[15]  M. Biggin Animal transcription networks as highly connected, quantitative continua. , 2011, Developmental cell.

[16]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[17]  V. Iyer,et al.  FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin. , 2007, Genome research.

[18]  David Z. Chen,et al.  Architecture of the human regulatory network derived from ENCODE data , 2012, Nature.

[19]  John J. Wyrick,et al.  Genome-wide location and function of DNA binding proteins. , 2000, Science.

[20]  Matthew W. Hahn,et al.  The evolution of transcriptional regulation in eukaryotes. , 2003, Molecular biology and evolution.

[21]  Sumio Sugano,et al.  The functional consequences of alternative promoter use in mammalian genomes. , 2008, Trends in genetics : TIG.

[22]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[23]  R. Young,et al.  Transcription of eukaryotic protein-coding genes. , 2000, Annual review of genetics.

[24]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[25]  Ya-Li Yao,et al.  Isolation and Characterization of cDNAs Corresponding to an Additional Member of the Human Histone Deacetylase Gene Family* , 1997, The Journal of Biological Chemistry.

[26]  J. Kawai,et al.  Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[27]  James T Kadonaga,et al.  Regulation of RNA Polymerase II Transcription by Sequence-Specific DNA Binding Factors , 2004, Cell.

[28]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[29]  Raymond K. Auerbach,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[30]  Zhaolei Zhang,et al.  Exploiting the determinants of stochastic gene expression in Saccharomyces cerevisiae for genome-wide prediction of expression noise , 2010, Proceedings of the National Academy of Sciences.

[31]  M. Gerstein,et al.  Genomic analysis of gene expression relationships in transcriptional regulatory networks. , 2003, Trends in genetics : TIG.

[32]  R. Kingston,et al.  Cooperation between Complexes that Regulate Chromatin Structure and Transcription , 2002, Cell.

[33]  A. Bird,et al.  CpG islands and the regulation of transcription. , 2011, Genes & development.

[34]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[35]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[36]  R. Kornberg The molecular basis of eukaryotic transcription , 2007, Proceedings of the National Academy of Sciences.

[37]  Jun S. Liu,et al.  Integrating regulatory motif discovery and genome-wide expression analysis , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Henry Horng-Shing Lu,et al.  Statistical methods for identifying yeast cell cycle transcription factors. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[39]  J. Ahringer,et al.  Differential chromatin marking of introns and expressed exons by H3K36me3 , 2008, Nature Genetics.

[40]  William Stafford Noble,et al.  Genome-scale mapping of DNase I sensitivity in vivo using tiling DNA microarrays , 2006, Nature Methods.

[41]  Juan M. Vaquerizas,et al.  A census of human transcription factors: function, expression and evolution , 2009, Nature Reviews Genetics.

[42]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[43]  Atif Shahab,et al.  Fusion transcripts and transcribed retrotransposed loci discovered through comprehensive transcriptome analysis using Paired-End diTags (PETs). , 2007, Genome research.

[44]  Ty C. Voss,et al.  Dynamic Exchange at Regulatory Elements during Chromatin Remodeling Underlies Assisted Loading Mechanism , 2011, Cell.

[45]  Shane C. Dillon,et al.  The landscape of histone modifications across 1% of the human genome in five human cell lines. , 2007, Genome research.

[46]  Bing Li,et al.  The Role of Chromatin during Transcription , 2007, Cell.

[47]  D J Anderson,et al.  The neuron-restrictive silencer factor (NRSF): a coordinate repressor of multiple neuron-specific genes , 1995, Science.

[48]  S. Yamanaka,et al.  Induction of Pluripotent Stem Cells from Mouse Embryonic and Adult Fibroblast Cultures by Defined Factors , 2006, Cell.

[49]  J. Bell,et al.  A Genome-Wide Study of DNA Methylation Patterns and Gene Expression Levels in Multiple Human and Chimpanzee Tissues , 2011, PLoS genetics.

[50]  W. Wong,et al.  ChIP-Seq of transcription factors predicts absolute and differential gene expression in embryonic stem cells , 2009, Proceedings of the National Academy of Sciences.

[51]  M. Gerstein,et al.  Modeling the relative relationship of transcription factor binding and histone modifications to gene expression levels in mouse embryonic stem cells , 2011, Nucleic acids research.

[52]  C. Hsieh,et al.  Transcriptional Activity Affects the H3K4me3 Level and Distribution in the Coding Region , 2010, Molecular and Cellular Biology.

[53]  Wenxuan Zhong,et al.  Statistical assessment of the global regulatory role of histone acetylation in Saccharomyces cerevisiae , 2006, Genome Biology.

[54]  Shane C. Dillon,et al.  Identifying gene regulatory elements by genomic microarray mapping of DNaseI hypersensitive sites. , 2006, Genome research.

[55]  Huai Li,et al.  Unraveling transcriptional regulatory programs by integrative analysis of microarray and transcription factor binding data , 2008, Bioinform..