Inference of RNA Polymerase II Transcription Dynamics from Chromatin Immunoprecipitation Time Course Data

Gene transcription mediated by RNA polymerase II (pol-II) is a key step in gene expression. The dynamics of pol-II moving along the transcribed region influence the rate and timing of gene expression. In this work, we present a probabilistic model of transcription dynamics which is fitted to pol-II occupancy time course data measured using ChIP-Seq. The model can be used to estimate transcription speed and to infer the temporal pol-II activity profile at the gene promoter. Model parameters are estimated using either maximum likelihood estimation or via Bayesian inference using Markov chain Monte Carlo sampling. The Bayesian approach provides confidence intervals for parameter estimates and allows the use of priors that capture domain knowledge, e.g. the expected range of transcription speeds, based on previous experiments. The model describes the movement of pol-II down the gene body and can be used to identify the time of induction for transcriptionally engaged genes. By clustering the inferred promoter activity time profiles, we are able to determine which genes respond quickly to stimuli and group genes that share activity profiles and may therefore be co-regulated. We apply our methodology to biological data obtained using ChIP-seq to measure pol-II occupancy genome-wide when MCF-7 human breast cancer cells are treated with estradiol (E2). The transcription speeds we obtain agree with those obtained previously for smaller numbers of genes with the advantage that our approach can be applied genome-wide. We validate the biological significance of the pol-II promoter activity clusters by investigating cluster-specific transcription factor binding patterns and determining canonical pathway enrichment. We find that rapidly induced genes are enriched for both estrogen receptor alpha (ER) and FOXA1 binding in their proximal promoter regions.

[1]  John Shawe-Taylor,et al.  Introduction to the Special Topic on Grammar Induction, Representation of Language and Language Learning , 2011, J. Mach. Learn. Res..

[2]  David Higdon,et al.  A process-convolution approach to modelling temperatures in the North Atlantic Ocean , 1998, Environmental and Ecological Statistics.

[3]  A. Mortazavi,et al.  Computation for ChIP-seq and RNA-seq studies , 2009, Nature Methods.

[4]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[5]  Vidhya Jagannathan,et al.  Meta-analysis of estrogen response in MCF-7 distinguishes early target genes involved in signaling and cell proliferation from later target genes involved in cell cycle and DNA repair , 2011, BMC Systems Biology.

[6]  D. Komura,et al.  A wave of nascent transcription on activated human genes , 2009, Proceedings of the National Academy of Sciences.

[7]  Markus Harva,et al.  Bayesian Estimation of Time Delays Between Unevenly Sampled Signals , 2008, 2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing.

[8]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[9]  Neil D. Lawrence,et al.  Gaussian process modelling of latent chemical species: applications to inferring transcription factor activities , 2008, ECCB.

[10]  Graziano Pesole,et al.  Pscan: finding over-represented transcription factor binding site motifs in sequences from co-regulated or co-expressed genes , 2009, Nucleic Acids Res..

[11]  Noel A Cressie,et al.  Some topics in convolution-based spatial modeling , 2007 .

[12]  Neil D. Lawrence,et al.  Sparse Convolved Gaussian Processes for Multi-output Regression , 2008, NIPS.

[13]  X. Darzacq,et al.  In vivo dynamics of RNA polymerase II transcription , 2007, Nature Structural &Molecular Biology.

[14]  Clifford A. Meyer,et al.  Genome-wide analysis of estrogen receptor binding sites , 2006, Nature Genetics.

[15]  E. Liu,et al.  An Oestrogen Receptor α-bound Human Chromatin Interactome , 2009, Nature.

[16]  Neil D. Lawrence,et al.  Kernels for Vector-Valued Functions: a Review , 2011, Found. Trends Mach. Learn..

[17]  Deborah B. Haarsma,et al.  The Radio Wavelength Time Delay of Gravitational Lens 0957+561 , 1999 .

[18]  H. Stunnenberg,et al.  Identifying estrogen receptor target genes , 2007, Molecular oncology.

[19]  Wyeth W. Wasserman,et al.  JASPAR: an open-access database for eukaryotic transcription factor binding profiles , 2004, Nucleic Acids Res..

[20]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[21]  K. White,et al.  Genomic Antagonism between Retinoic Acid and Estrogen Signaling in Breast Cancer , 2009, Cell.

[22]  Peter Tiño,et al.  How accurate are the time delay estimates in gravitational lensing? , 2006, ArXiv.

[23]  J. McNally,et al.  Fast transcription rates of RNA polymerase II in human cells , 2011, EMBO reports.

[24]  S. Kasif,et al.  Immediate-Early and Delayed Primary Response Genes Are Distinct in Function and Genomic Architecture* , 2007, Journal of Biological Chemistry.

[25]  Gil Ast,et al.  Alternative splicing and disease , 2008, RNA biology.

[26]  Neil D. Lawrence,et al.  Computationally Efficient Convolved Multiple Output Gaussian Processes , 2011, J. Mach. Learn. Res..

[27]  Quantitative Methods for Current Environmental Issues , 2005 .

[28]  R. Sandberg,et al.  CTCF-promoted RNA polymerase II pausing links DNA methylation to splicing , 2011, Nature.

[29]  Andrew Gelman,et al.  Handbook of Markov Chain Monte Carlo , 2011 .

[30]  Neil D. Lawrence,et al.  A Simple Approach to Ranking Differentially Expressed Gene Expression Time Courses through Gaussian Process Regression , 2011, BMC Bioinformatics.

[31]  D. Patel,et al.  TRIM24 links a noncanonical histone signature to breast cancer , 2010, Nature.

[32]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[33]  H. Stunnenberg,et al.  ChIP‐Seq of ERα and RNA polymerase II defines genes differentially responding to ligands , 2009, The EMBO journal.

[34]  N. D. Clarke,et al.  Integrative model of genomic factors for determining binding site selection by estrogen receptor-α , 2010, Molecular systems biology.

[35]  D. Higdon Space and Space-Time Modeling using Process Convolutions , 2002 .

[36]  Clifford A. Meyer,et al.  FoxA1 Translates Epigenetic Signatures into Enhancer-Driven Lineage-Specific Transcription , 2008, Cell.

[37]  Ronald P. Barry,et al.  Constructing and fitting models for cokriging and multivariable spatial prediction , 1998 .

[38]  R. Padgett,et al.  Rates of in situ transcription and splicing in large human genes , 2009, Nature Structural &Molecular Biology.

[39]  Brent S. Pedersen,et al.  Pybedtools: a flexible Python library for manipulating genomic datasets and annotations , 2011, Bioinform..

[40]  Wei Liu,et al.  Gaussian process modelling for bicoid mRNA regulation in spatio-temporal Bicoid profile , 2012, Bioinform..

[41]  Jeffrey S. Rosenthal,et al.  Optimal Proposal Distributions and Adaptive MCMC , 2011 .

[42]  A. Malovannaya,et al.  Global characterization of transcriptional impact of the SRC-3 coregulator. , 2010, Molecular endocrinology.

[43]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumours , 2013 .

[44]  T. Misteli,et al.  Transcription dynamics. , 2009, Molecular cell.

[45]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[46]  Petra C. Schwalie,et al.  A CTCF-independent role for cohesin in tissue-specific transcription. , 2010, Genome research.

[47]  R Edelson,et al.  The Discrete Correlation Function: a New Method for Analyzing Unevenly Sampled Variability Data , 1988 .

[48]  Antti Honkela,et al.  Model-based method for transcription factor target identification with limited data , 2010, Proceedings of the National Academy of Sciences.

[49]  Leighton J. Core,et al.  A Rapid, Extensive, and Transient Transcriptional Response to Estrogen Signaling in Breast Cancer Cells , 2011, Cell.

[50]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[51]  Neil D. Lawrence,et al.  puma: a Bioconductor package for propagating uncertainty in microarray analysis , 2009, BMC Bioinformatics.

[52]  Marcus R. Frean,et al.  Dependent Gaussian Processes , 2004, NIPS.

[53]  C. Klinge Estrogen receptor interaction with estrogen response elements. , 2001, Nucleic acids research.