The signal within the noise: efficient inference of stochastic gene regulation models using fluorescence histograms and stochastic simulations

MOTIVATION In the noisy cellular environment, stochastic fluctuations at the molecular level manifest as cell-cell variability at the population level that is quantifiable using high-throughput single-cell measurements. Such variability is rich with information about the cell's underlying gene regulatory networks, their architecture and the parameters of the biochemical reactions at their core. RESULTS We report a novel method, called Inference for Networks of Stochastic Interactions among Genes using High-Throughput data (INSIGHT), for systematically combining high-throughput time-course flow cytometry measurements with computer-generated stochastic simulations of candidate gene network models to infer the network's stochastic model and all its parameters. By exploiting the mathematical relationships between experimental and simulated population histograms, INSIGHT achieves scalability, efficiency and accuracy while entirely avoiding approximate stochastic methods. We demonstrate our method on a synthetic gene network in bacteria and show that a detailed mechanistic model of this network can be estimated with high accuracy and high efficiency. Our method is completely general and can be used to infer models of signal-activated gene networks in any organism based solely on flow cytometry data and stochastic simulations. AVAILABILITY A free C source code implementing the INSIGHT algorithm, together with test data is available from the authors. CONTACT mustafa.khammash@bsse.ethz.ch SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Ertugrul M. Ozbudak,et al.  Regulation of noise in the expression of a single gene , 2002, Nature Genetics.

[2]  Tina Toni,et al.  Elucidating the in vivo phosphorylation dynamics of the ERK MAP kinase using quantitative proteomics data and Bayesian model selection. , 2012, Molecular bioSystems.

[3]  M. Khammash,et al.  Systematic Identification of Signal-Activated Stochastic Gene Regulation , 2013, Science.

[4]  E. Altman,et al.  Construction and characterization of a highly regulable expression vector, pLAC11, and its multipurpose derivatives, pLAC22 and pLAC33. , 2000, Plasmid.

[5]  A. Arkin,et al.  Stochastic kinetic analysis of developmental pathway bifurcation in phage lambda-infected Escherichia coli cells. , 1998, Genetics.

[6]  João Pedro Hespanha,et al.  Approximate Moment Dynamics for Chemically Reacting Systems , 2011, IEEE Transactions on Automatic Control.

[7]  Jason Wittenberg,et al.  Clarify: Software for Interpreting and Presenting Statistical Results , 2003 .

[8]  Erik De Schutter,et al.  A Stochastic Signaling Network Mediates the Probabilistic Induction of Cerebellar Long-Term Depression , 2012, The Journal of Neuroscience.

[9]  Kirsten Jung,et al.  Timing and dynamics of single cell gene expression in the arabinose utilization system. , 2008, Biophysical journal.

[10]  David A. Rand,et al.  Bayesian inference of biochemical kinetic parameters using the linear noise approximation , 2009, BMC Bioinformatics.

[11]  M. Elowitz,et al.  A synthetic oscillatory network of transcriptional regulators , 2000, Nature.

[12]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[13]  D. Gillespie The chemical Langevin equation , 2000 .

[14]  M. Feldman,et al.  Population growth of human Y chromosomes: a study of Y chromosome microsatellites. , 1999, Molecular biology and evolution.

[15]  Thomas Thorne,et al.  Calibrating spatio-temporal models of leukocyte dynamics against in vivo live-imaging data using approximate Bayesian computation. , 2012, Integrative biology : quantitative biosciences from nano to macro.

[16]  P. Swain,et al.  Stochastic Gene Expression in a Single Cell , 2002, Science.

[17]  M. Khammash,et al.  The finite state projection algorithm for the solution of the chemical master equation. , 2006, The Journal of chemical physics.

[18]  Xiaohui Xie,et al.  Parameter inference for discretely observed stochastic kinetic models using stochastic gradient descent , 2010, BMC Systems Biology.

[19]  Darren J Wilkinson,et al.  Bayesian parameter inference for stochastic biochemical network models using particle Markov chain Monte Carlo , 2011, Interface Focus.

[20]  Jared E. Toettcher,et al.  Stochastic Gene Expression in a Lentiviral Positive-Feedback Loop: HIV-1 Tat Fluctuations Drive Phenotypic Diversity , 2005, Cell.

[21]  Brian Munsky,et al.  Listening to the noise: random fluctuations reveal gene network parameters , 2009, Molecular systems biology.

[22]  Mustafa Khammash,et al.  Model selection in stochastic chemical reaction networks using flow cytometry data , 2011, IEEE Conference on Decision and Control and European Control Conference.

[23]  L. Poulsen,et al.  New Unstable Variants of Green Fluorescent Protein for Studies of Transient Gene Expression in Bacteria , 1998, Applied and Environmental Microbiology.

[24]  G. Marsaglia,et al.  Evaluating Kolmogorov's distribution , 2003 .

[25]  D. Gillespie A General Method for Numerically Simulating the Stochastic Time Evolution of Coupled Chemical Reactions , 1976 .

[26]  David Welch,et al.  Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems , 2009, Journal of The Royal Society Interface.

[27]  J. Elf,et al.  Fast evaluation of fluctuations in biochemical networks with the linear noise approximation. , 2003, Genome research.

[28]  J Timmer,et al.  Parameter estimation in stochastic biochemical reactions. , 2006, Systems biology.

[29]  D. Gillespie Exact Stochastic Simulation of Coupled Chemical Reactions , 1977 .

[30]  Linda R. Petzold,et al.  Accelerated maximum likelihood parameter estimation for stochastic biochemical systems , 2012, BMC Bioinformatics.

[31]  J. Lygeros,et al.  Moment-based inference predicts bimodality in transient gene expression , 2012, Proceedings of the National Academy of Sciences.

[32]  Junbin Gao,et al.  Simulated maximum likelihood method for estimating kinetic rates in gene expression , 2007, Bioinform..

[33]  Stanley N Cohen,et al.  Global analysis of Escherichia coli RNA degradosome function using DNA microarrays. , 2004, Proceedings of the National Academy of Sciences of the United States of America.