Robust calibration of hierarchical population models for heterogeneous cell populations

Cellular heterogeneity is known to have important effects on signal processing and cellular decision making. To understand these processes, multiple classes of mathematical models have been introduced. The hierarchical population model builds a novel class which allows for the mechanistic description of heterogeneity and explicitly takes into account subpopulation structures. However, this model requires a parametric distribution assumption for the cell population and, so far, only the normal distribution has been employed. Here, we incorporate alternative distribution assumptions into the model, assess their robustness against outliers and evaluate their influence on the performance of model calibration in a simulation study and a real-world application example. We found that alternative distributions provide reliable parameter estimates even in the presence of outliers, and can in fact increase the convergence of model calibration. Highlights Generalizes hierarchical population model to various distribution assumptions Provides framework for efficient calibration of the hierarchical population model Simulation study and application to experimental data reveal improved robustness and optimization performance

[1]  A. Oudenaarden,et al.  Validation of noise models for single-cell transcriptomics , 2014, Nature Methods.

[2]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[3]  J. Stelling,et al.  Ensemble modeling for analysis of cell signaling dynamics , 2007, Nature Biotechnology.

[4]  Orit Rozenblatt-Rosen,et al.  Systematic comparative analysis of single cell RNA-sequencing methods , 2019, bioRxiv.

[5]  D. Gillespie The chemical Langevin equation , 2000 .

[6]  Jessica C Mar,et al.  The rise of the distributions: why non-normality is important for understanding the transcriptome and beyond , 2019, Biophysical Reviews.

[7]  Peng Shi,et al.  Multivariate Negative Binomial Models for Insurance Claim Counts , 2012 .

[8]  Stefan Engblom,et al.  Computing the moments of high dimensional solutions of the master equation , 2006, Appl. Math. Comput..

[9]  P. Swain,et al.  Intrinsic and extrinsic contributions to stochasticity in gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[10]  D. Vitkup,et al.  Maximum Entropy Framework For Inference Of Cell Population Heterogeneity In Signaling Networks , 2017 .

[11]  J. Mesirov,et al.  Automated high-dimensional flow cytometric data analysis , 2009, Proceedings of the National Academy of Sciences.

[12]  Jonas Wallin,et al.  BayesFlow: latent modeling of flow cytometry cell populations , 2015, BMC Bioinformatics.

[13]  Pavel V. Shevchenko,et al.  The t copula with multiple parameters of degrees of freedom: bivariate characteristics and application to risk management , 2007 .

[14]  Jan Hasenauer,et al.  Estimation of biochemical network parameter distributions in cell populations , 2009, 0905.1191.

[15]  Fabian J Theis,et al.  Single-cell RNA-seq denoising using a deep count autoencoder , 2019, Nature Communications.

[16]  J. Hasenauer,et al.  Dirac mixture distributions for the approximation of mixed effects models⋆ , 2019, bioRxiv.

[17]  Raphael Gottardo,et al.  Automated gating of flow cytometry data via robust model‐based clustering , 2008, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[18]  Edda Klipp,et al.  Systems Biology in Practice , 2005 .

[19]  S. Sahu,et al.  A new class of multivariate skew distributions with applications to bayesian regression models , 2003 .

[20]  E. Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[21]  Jan Hasenauer,et al.  Robust parameter estimation for dynamical systems from outlier‐corrupted data , 2017, Bioinform..

[22]  Fabian J. Theis,et al.  Scalable Parameter Estimation for Genome-Scale Biochemical Reaction Networks , 2016, bioRxiv.

[23]  M. A. Henson Dynamic modeling of microbial cell populations. , 2003, Current opinion in biotechnology.

[24]  Vahid Shahrezaei,et al.  Analytical distributions for stochastic gene expression , 2008, Proceedings of the National Academy of Sciences.

[25]  Jan Hasenauer,et al.  PESTO: Parameter EStimation TOolbox , 2017, Bioinform..

[26]  A. Oudenaarden,et al.  Cellular Decision Making and Biological Noise: From Microbes to Mammals , 2011, Cell.

[27]  Fabian J Theis,et al.  Current best practices in single‐cell RNA‐seq analysis: a tutorial , 2019, Molecular systems biology.

[28]  Fabian J. Theis,et al.  CERENA: ChEmical REaction Network Analyzer—A Toolbox for the Simulation and Analysis of Stochastic Chemical Kinetics , 2016, PloS one.

[29]  Lisa Amrhein,et al.  A mechanistic model for the negative binomial distribution of single-cell mRNA counts , 2019, bioRxiv.

[30]  Aleksandra A. Kolodziejczyk,et al.  Classification of low quality cells from single-cell RNA-seq data , 2016, Genome Biology.

[31]  Rudolph van der Merwe,et al.  Sigma-point kalman filters for probabilistic inference in dynamic state-space models , 2004 .

[32]  Ursula Klingmüller,et al.  Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood , 2009, Bioinform..

[33]  Jan Hasenauer,et al.  A Hierarchical, Data-Driven Approach to Modeling Single-Cell Populations Predicts Latent Causes of Cell-To-Cell Variability. , 2018, Cell systems.

[34]  Fabian J Theis,et al.  Lessons Learned from Quantitative Dynamical Modeling in Systems Biology , 2013, PloS one.

[35]  Jeremy MG Taylor,et al.  Robust Statistical Modeling Using the t Distribution , 1989 .

[36]  D. Gillespie Exact Stochastic Simulation of Coupled Chemical Reactions , 1977 .

[37]  Lani F. Wu,et al.  Cellular Heterogeneity: Do Differences Make a Difference? , 2010, Cell.

[38]  Jan Hasenauer,et al.  Parameter Estimation for Reaction Rate Equation Constrained Mixture Models , 2016, CMSB.