Model-based clustering for flow and mass cytometry data with clinical information

Background High-dimensional flow cytometry and mass cytometry allow systemic-level characterization of more than 10 protein profiles at single-cell resolution and provide a much broader landscape in many biological applications, such as disease diagnosis and prediction of clinical outcome. When associating clinical information with cytometry data, traditional approaches require two distinct steps for identification of cell populations and statistical test to determine whether the difference between two population proportions is significant. These two-step approaches can lead to information loss and analysis bias. Results We propose a novel statistical framework, called LAMBDA (Latent Allocation Model with Bayesian Data Analysis), for simultaneous identification of unknown cell populations and discovery of associations between these populations and clinical information. LAMBDA uses specified probabilistic models designed for modeling the different distribution information for flow or mass cytometry data, respectively. We use a zero-inflated distribution for the mass cytometry data based the characteristics of the data. A simulation study confirms the usefulness of this model by evaluating the accuracy of the estimated parameters. We also demonstrate that LAMBDA can identify associations between cell populations and their clinical outcomes by analyzing real data. LAMBDA is implemented in R and is available from GitHub (https://github.com/abikoushi/lambda).

[1]  C. Thane,et al.  Conditional Gaussian mixture modelling for dietary pattern analysis , 2007 .

[2]  Sean C. Bendall,et al.  Single-Cell Mass Cytometry of Differential Immune and Drug Responses Across a Human Hematopoietic Continuum , 2011, Science.

[3]  Stephen,et al.  Title: Reproducible diagnosis of Chronic Lymphocytic Leukemia by flow cytometry: an European Research Initiative on CLL (ERIC) & European Society for Clinical Cell Analysis (ESCCA) harmonisation project , 2017 .

[4]  E. Birney,et al.  Breast cancer genome and transcriptome integration implicates specific mutational signatures with immune cell infiltration , 2016, Nature Communications.

[5]  Roshini S. Abraham,et al.  Flow Cytometry, a Versatile Tool for Diagnosis and Monitoring of Primary Immunodeficiencies , 2016, Clinical and Vaccine Immunology.

[6]  G. Freeman,et al.  Selective expansion of a subset of exhausted CD8 T cells by αPD-L1 blockade , 2008, Proceedings of the National Academy of Sciences.

[7]  C. Thompson,et al.  T-cell regulation by CD28 and CTLA-4 , 2001, Nature Reviews Immunology.

[8]  R. Tibshirani,et al.  Automated identification of stratifying signatures in cellular subpopulations , 2014, Proceedings of the National Academy of Sciences.

[9]  G. Gaud,et al.  Regulatory mechanisms in T cell receptor signalling , 2018, Nature Reviews Immunology.

[10]  Mark D. Robinson,et al.  diffcyt: Differential discovery in high-dimensional cytometry via high-resolution clustering , 2018, Communications Biology.

[11]  E. Wherry,et al.  Overcoming T cell exhaustion in infection and cancer. , 2015, Trends in immunology.

[12]  M. Stephens Dealing with label switching in mixture models , 2000 .

[13]  G. Nolan,et al.  Mass Cytometry: Single Cells, Many Features , 2016, Cell.

[14]  Y. Saeys,et al.  Computational flow cytometry: helping to make sense of high-dimensional immunology data , 2016, Nature Reviews Immunology.

[15]  Wei Zhang,et al.  Single-Cell Modeling of CD8+ T Cell Exhaustion Predicts Response to Cancer Immunotherapy , 2018, bioRxiv.