MEMO: multi-experiment mixture model analysis of censored data

Motivation: The statistical analysis of single-cell data is a challenge in cell biological studies. Tailored statistical models and computational methods are required to resolve the subpopulation structure, i.e. to correctly identify and characterize subpopulations. These approaches also support the unraveling of sources of cell-to-cell variability. Finite mixture models have shown promise, but the available approaches are ill suited to the simultaneous consideration of data from multiple experimental conditions and to censored data. The prevalence and relevance of single-cell data and the lack of suitable computational analytics make automated methods, that are able to deal with the requirements posed by these data, necessary. Results: We present MEMO, a flexible mixture modeling framework that enables the simultaneous, automated analysis of censored and uncensored data acquired under multiple experimental conditions. MEMO is based on maximum-likelihood inference and allows for testing competing hypotheses. MEMO can be applied to a variety of different single-cell data types. We demonstrate the advantages of MEMO by analyzing right and interval censored single-cell microscopy data. Our results show that an examination of censoring and the simultaneous consideration of different experimental conditions are necessary to reveal biologically meaningful subpopulation structures. MEMO allows for a stringent analysis of single-cell data and enables researchers to avoid misinterpretation of censored data. Therefore, MEMO is a valuable asset for all fields that infer the characteristics of populations by looking at single individuals such as cell biology and medicine. Availability and Implementation: MEMO is implemented in MATLAB and freely available via github (https://github.com/MEMO-toolbox/MEMO). Contacts: eva-maria.geissen@ist.uni-stuttgart.de or nicole.radde@ist.uni-stuttgart.de Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Bryan Howie Life's onslaught , 2010 .

[2]  B. Efron,et al.  Bootstrap confidence intervals , 1996 .

[3]  Ramon Grima,et al.  Discreteness-induced concentration inversion in mesoscopic chemical systems , 2012, Nature Communications.

[4]  Ronald M. Levy,et al.  Joint Modeling and Registration of Cell Populations in Cohorts of High-Dimensional Flow Cytometric Data , 2013, PloS one.

[5]  Yu-Qiu Zhang,et al.  Ryanodine receptors contribute to the induction of nociceptive input-evoked long-term potentiation in the rat spinal cord slice , 2010, Molecular pain.

[6]  Fabian J. Theis,et al.  ODE Constrained Mixture Modelling: A Method for Unraveling Subpopulation Structures and Dynamics , 2014, PLoS Comput. Biol..

[7]  Gyemin Lee,et al.  EM algorithms for multivariate Gaussian mixture models with truncated and censored data , 2012, Comput. Stat. Data Anal..

[8]  A. Raftery,et al.  Estimating Bayes Factors via Posterior Simulation with the Laplace—Metropolis Estimator , 1997 .

[9]  R. Grima,et al.  An effective rate equation approach to reaction kinetics in small volumes: theory and application to biochemical reactions in nonequilibrium steady-state conditions. , 2010, The Journal of chemical physics.

[10]  Sue Biggins,et al.  Signalling dynamics in the spindle checkpoint response , 2014, Nature Reviews Molecular Cell Biology.

[11]  Ursula Klingmüller,et al.  Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood , 2009, Bioinform..

[12]  Jamol Pender The truncated normal distribution: Applications to queues with impatient customers , 2015, Oper. Res. Lett..

[13]  Geoffrey J. McLachlan,et al.  Maximum Likelihood Estimation of Mixture Densities for Binned and Truncated Multivariate Data , 2002, Machine Learning.

[14]  Shuguang Huang,et al.  Mixture‐model classification in DNA content analysis , 2007, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[15]  G. McLachlan,et al.  Fitting mixture models to grouped and truncated data via the EM algorithm. , 1988, Biometrics.

[16]  David R. Anderson,et al.  Multimodel Inference , 2004 .

[17]  J. Skilling Nested sampling for general Bayesian computation , 2006 .

[18]  Luis A. Escobar,et al.  Teaching about Approximate Confidence Regions Based on Maximum Likelihood Estimation , 1995 .

[19]  P. Müller,et al.  Approximatereference priors in the presence of latent structures , 2010 .

[20]  Sonja Meyer,et al.  Quantitative automated microscopy (QuAM) elucidates growth factor specific signalling in pain sensitization , 2010, Molecular pain.

[21]  Jhagvaral Hasbold,et al.  Activation-Induced B Cell Fates Are Selected by Intracellular Stochastic Competition , 2012, Science.

[22]  Fabian J Theis,et al.  Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells , 2015, Nature Biotechnology.

[23]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[24]  Fabian J Theis,et al.  Method of conditional moments (MCM) for the Chemical Master Equation , 2013, Journal of Mathematical Biology.

[25]  Aleksandra A. Kolodziejczyk,et al.  Accounting for technical noise in single-cell RNA-seq experiments , 2013, Nature Methods.

[26]  S. Linnarsson,et al.  Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing , 2014, Nature Neuroscience.

[27]  Karel Svoboda,et al.  From cudgel to scalpel: toward precise neural control with optogenetics , 2011, Nature Methods.

[28]  Ming-Hui Chen,et al.  Monte Carlo Estimation of Bayesian Credible and HPD Intervals , 1999 .

[29]  Raphael Gottardo,et al.  Automated gating of flow cytometry data via robust model‐based clustering , 2008, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[30]  M. Girolami,et al.  Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[31]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[32]  Hans Clevers,et al.  Single-cell messenger RNA sequencing reveals rare intestinal cell types , 2015, Nature.

[33]  J. Mesirov,et al.  Automated high-dimensional flow cytometric data analysis , 2009, Proceedings of the National Academy of Sciences.

[34]  Theodore J. Perkins,et al.  Estimating the Stochastic Bifurcation Structure of Cellular Networks , 2010, PLoS Comput. Biol..

[35]  Gürol M. Süel,et al.  Temporal competition between differentiation programs determines cell fate choice , 2011, Molecular systems biology.

[36]  Jonas Wallin,et al.  BayesFlow: latent modeling of flow cytometry cell populations , 2015, BMC Bioinformatics.

[37]  Fabian J Theis,et al.  High-dimensional Bayesian parameter estimation: case study for a model of JAK2/STAT5 signaling. , 2013, Mathematical biosciences.

[38]  .. W. V. Der,et al.  On Profile Likelihood , 2000 .

[39]  Fabian J Theis,et al.  Lessons Learned from Quantitative Dynamical Modeling in Systems Biology , 2013, PloS one.

[40]  Chang Hyeong Lee,et al.  A moment closure method for stochastic reaction networks. , 2009, The Journal of chemical physics.

[41]  Darren J. Wilkinson,et al.  Bayesian methods in bioinformatics and computational systems biology , 2006, Briefings Bioinform..

[42]  Stefan Engblom,et al.  Computing the moments of high dimensional solutions of the master equation , 2006, Appl. Math. Comput..

[43]  Tomohiro Ando,et al.  Bayesian Model Averaging and Bayesian Predictive Information Criterion for Model Selection , 2008 .

[44]  Jens Timmer,et al.  Joining forces of Bayesian and frequentist methodology: a study for inference in the presence of non-identifiability , 2012, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[45]  Philipp S. Hoppe,et al.  Single-cell technologies sharpen up mammalian stem cell research , 2014, Nature Cell Biology.

[46]  Nima Aghaeepour,et al.  Flow Cytometry Bioinformatics , 2013, PLoS Comput. Biol..

[47]  P. Swain,et al.  Intrinsic and extrinsic contributions to stochasticity in gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[48]  Heikki Haario,et al.  DRAM: Efficient adaptive MCMC , 2006, Stat. Comput..

[49]  Xiao-Li Meng,et al.  SIMULATING RATIOS OF NORMALIZING CONSTANTS VIA A SIMPLE IDENTITY: A THEORETICAL EXPLORATION , 1996 .

[50]  A Kremling,et al.  Exploiting the bootstrap method for quantifying parameter confidence intervals in dynamical systems. , 2006, Metabolic engineering.

[51]  Mark A. Girolami,et al.  Bayesian ranking of biochemical system models , 2008, Bioinform..

[52]  Christian Widmer,et al.  Determinants of robustness in spindle assembly checkpoint signalling , 2013, Nature Cell Biology.

[53]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[54]  A. Oudenaarden,et al.  Cellular Decision Making and Biological Noise: From Microbes to Mammals , 2011, Cell.

[55]  M. Elowitz,et al.  Functional roles for noise in genetic circuits , 2010, Nature.

[56]  Matthew M. Crane,et al.  A Microfluidic System for Studying Ageing and Dynamic Single-Cell Responses in Budding Yeast , 2014, PloS one.

[57]  J. Hadamard Sur les problemes aux derive espartielles et leur signification physique , 1902 .

[58]  S. S. Wilks The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses , 1938 .

[59]  Clifford M. Hurvich,et al.  Regression and time series model selection in small samples , 1989 .

[60]  Timm Schroeder,et al.  Long-term single-cell imaging of mammalian stem cells , 2011, Nature Methods.

[61]  W. Ebeling Stochastic Processes in Physics and Chemistry , 1995 .

[62]  N. Popović,et al.  Phenotypic switching in gene regulatory networks , 2014, Proceedings of the National Academy of Sciences.

[63]  J. Elf,et al.  Fast evaluation of fluctuations in biochemical networks with the linear noise approximation. , 2003, Genome research.

[64]  Lucas Pelkmans,et al.  Using Cell-to-Cell Variability—A New Era in Molecular Biology , 2012, Science.

[65]  Frank Allgöwer,et al.  Computation of the posterior entropy in a Bayesian framework for parameter estimation in biological networks , 2010, 2010 IEEE International Conference on Control Applications.

[66]  Fabian J. Theis,et al.  destiny: diffusion maps for large-scale single-cell data in R , 2015, Bioinform..