Identifying targets of multiple co-regulating transcription factors from expression time-series by Bayesian model comparison

BackgroundComplete transcriptional regulatory network inference is a huge challenge because of the complexity of the network and sparsity of available data. One approach to make it more manageable is to focus on the inference of context-specific networks involving a few interacting transcription factors (TFs) and all of their target genes.ResultsWe present a computational framework for Bayesian statistical inference of target genes of multiple interacting TFs from high-throughput gene expression time-series data. We use ordinary differential equation models that describe transcription of target genes taking into account combinatorial regulation. The method consists of a training and a prediction phase. During the training phase we infer the unobserved TF protein concentrations on a subnetwork of approximately known regulatory structure. During the prediction phase we apply Bayesian model selection on a genome-wide scale and score all alternative regulatory structures for each target gene. We use our methodology to identify targets of five TFs regulating Drosophila melanogaster mesoderm development. We find that confident predicted links between TFs and targets are significantly enriched for supporting ChIP-chip binding events and annotated TF-gene interations. Our method statistically significantly outperforms existing alternatives.ConclusionsOur results show that it is possible to infer regulatory links between multiple interacting TFs and their target genes even from a single relatively short time series and in presence of unmodelled confounders and unreliable prior knowledge on training network connectivity. Introducing data from several different experimental perturbations significantly increases the accuracy.

[1]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[2]  D. di Bernardo,et al.  Direct targets of the TRP63 transcription factor revealed by a combination of gene expression profiling and reverse engineering. , 2008, Genome research.

[3]  Harri Lähdesmäki,et al.  Learning gene regulatory networks from gene expression measurements using non-parametric molecular kinetics , 2009, Bioinform..

[4]  D. di Bernardo,et al.  How to infer gene networks from expression profiles , 2007, Molecular systems biology.

[5]  Neil D. Lawrence,et al.  Efficient Sampling for Gaussian Process Inference using Control Variables , 2008, NIPS.

[6]  M. Frasch,et al.  The homeodomain of Tinman mediates homo- and heterodimerization of NK proteins. , 2005, Biochemical and biophysical research communications.

[7]  Mikkel N. Schmidt Function factorization using warped Gaussian processes , 2009, ICML '09.

[8]  Richard Bonneau,et al.  DREAM3: Network Inference Using Dynamic Context Likelihood of Relatedness and the Inferelator , 2010, PloS one.

[9]  N. D. Clarke,et al.  Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges , 2010, PloS one.

[10]  Charles J. Geyer,et al.  Practical Markov Chain Monte Carlo , 1992 .

[11]  Richard Bonneau,et al.  DREAM4: Combining Genetic and Dynamic Information to Identify Biological Networks and Dynamical Models , 2010, PloS one.

[12]  Stephen Guest,et al.  DroID 2011: a comprehensive, integrated resource for protein, transcription factor, RNA and gene interactions for Drosophila , 2010, Nucleic Acids Res..

[13]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[14]  Raya Khanin,et al.  Bayesian model-based inference of transcription factor activity , 2007, BMC Bioinformatics.

[15]  J. Martín,et al.  Mutational analysis of the DNA binding, dimerization, and transcriptional activation domains of MEF2C , 1996, Molecular and cellular biology.

[16]  Carl E. Rasmussen,et al.  Assessing Approximate Inference for Binary Gaussian Process Classification , 2005, J. Mach. Learn. Res..

[17]  Neil D. Lawrence,et al.  puma: a Bioconductor package for propagating uncertainty in microarray analysis , 2009, BMC Bioinformatics.

[18]  Marco Grzegorczyk,et al.  Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical gaussian models and bayesian networks , 2006, Bioinform..

[19]  Ryan P. Adams,et al.  Slice sampling covariance hyperparameters of latent Gaussian models , 2010, NIPS.

[20]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[21]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[22]  Jeffrey N. Myers,et al.  IL-6 Stabilizes Twist and Enhances Tumor Cell Motility in Head and Neck Cancer Cells through Activation of Casein Kinase 2 , 2011, PloS one.

[23]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[24]  Riet De Smet,et al.  Advantages and limitations of current network inference methods , 2010, Nature Reviews Microbiology.

[25]  Quaid Morris,et al.  Transcriptional networks: reverse-engineering gene regulation on a global scale. , 2004, Current opinion in microbiology.

[26]  M. Girolami,et al.  Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[27]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[28]  Xiang-Sun Zhang,et al.  Inferring transcriptional interactions and regulator activities from experimental data. , 2007, Molecules and cells.

[29]  Ryan P. Adams,et al.  Elliptical slice sampling , 2009, AISTATS.

[30]  John J. Wyrick,et al.  Genome-wide location and function of DNA binding proteins. , 2000, Science.

[31]  Mark A. Girolami,et al.  Estimating Bayes factors via thermodynamic integration and population MCMC , 2009, Comput. Stat. Data Anal..

[32]  Antti Honkela,et al.  Model-based method for transcription factor target identification with limited data , 2010, Proceedings of the National Academy of Sciences.

[33]  M. Baylies,et al.  Dimerization partners determine the activity of the Twist bHLH protein during Drosophila mesoderm development. , 2001, Development.

[34]  Charles Boone,et al.  Identifying transcription factor functions and targets by phenotypic activation , 2006, Proceedings of the National Academy of Sciences.

[35]  M. Barenco,et al.  Ranked prediction of p53 targets using hidden variable dynamic modeling , 2006, Genome Biology.

[36]  Stephane Zaffran,et al.  PII: S0925-4773(02)00063-1 , 2002 .

[37]  Marcel J. T. Reinders,et al.  Least absolute regression network analysis of the murine osteoblast differentiation network , 2006, Bioinform..

[38]  Neil D. Lawrence,et al.  tigre: Transcription factor inference through gaussian process reconstruction of expression for bioconductor , 2011, Bioinform..

[39]  M. Ashburner,et al.  Systematic determination of patterns of gene expression during Drosophila embryogenesis , 2002, Genome Biology.

[40]  A. Califano,et al.  Dialogue on Reverse‐Engineering Assessment and Methods , 2007, Annals of the New York Academy of Sciences.

[41]  Christophe Andrieu,et al.  A tutorial on adaptive MCMC , 2008, Stat. Comput..

[42]  D di Bernardo,et al.  Inference of gene networks from temporal gene expression profiles. , 2007, IET systems biology.

[43]  Xiao-Li Meng,et al.  Simulating Normalizing Constants: From Importance Sampling to Bridge Sampling to Path Sampling , 1998 .

[44]  Andreas Ruttor,et al.  Switching regulatory models of cellular stress response , 2009, Bioinform..

[45]  Zoubin Ghahramani,et al.  A Bayesian approach to reconstructing genetic regulatory networks with hidden factors , 2005, Bioinform..

[46]  S. Chib Marginal Likelihood from the Gibbs Output , 1995 .

[47]  Martino Barenco,et al.  Dissection of a complex transcriptional response using genome-wide transcriptional modelling , 2009, Molecular systems biology.

[48]  Richard Bonneau,et al.  The inferelator 2.0: A scalable framework for reconstruction of dynamic regulatory network models , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[49]  Abraham P. Fong,et al.  Genome-wide transcription factor binding: beyond direct target regulation. , 2011, Trends in genetics : TIG.

[50]  Michael W Deem,et al.  Parallel tempering: theory, applications, and new perspectives. , 2005, Physical chemistry chemical physics : PCCP.

[51]  E. Furlong,et al.  Combinatorial binding predicts spatio-temporal cis-regulatory activity , 2009, Nature.

[52]  B. Black,et al.  Transcriptional control of muscle development by myocyte enhancer factor-2 (MEF2) proteins. , 1998, Annual review of cell and developmental biology.

[53]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[54]  Neil D. Lawrence,et al.  Gaussian process modelling of latent chemical species: applications to inferring transcription factor activities , 2008, ECCB.

[55]  Gareth O. Roberts,et al.  Robust Markov chain Monte Carlo Methods for Spatial Generalized Linear Mixed Models , 2006 .

[56]  A. Pettitt,et al.  Marginal likelihood estimation via power posteriors , 2008 .

[57]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[58]  Christopher A. Penfold,et al.  How to infer gene networks from expression profiles, revisited , 2011, Interface Focus.

[59]  R. Veitia,et al.  A sigmoidal transcriptional response: cooperativity, synergy and dosage effects , 2003, Biological reviews of the Cambridge Philosophical Society.

[60]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[61]  A. Gelman,et al.  Weak convergence and optimal scaling of random walk Metropolis algorithms , 1997 .

[62]  Richard Bonneau,et al.  The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo , 2006, Genome Biology.