cytometree: a binary tree algorithm for automatic gating in cytometry analysis

Motivation Flow cytometry is a powerful technology that allows the high-throughput quantification of dozens of surface and intracellular proteins at the single-cell level. It has become the most widely used technology for immunophenotyping of cells over the past three decades. Due to the increasing complexity of cytometry experiments (more cells and more markers), traditional manual flow cytometry data analysis has become untenable due to its subjectivity and time-consuming nature. Results We present a new unsupervised algorithm called “cytometree” to perform automated population discovery (aka gating) in flow cytometry. cytometree is based on the construction of a binary tree, the nodes of which are subpopulations of cells. At each node, the marker distributions are modeled by mixtures of normal distribution. Node splitting is done according to a normalized difference of Akaike information criteria (AIC) between the two models. Post-processing of the tree structure and derived populations allows us to complete the annotation of the derived populations. The algorithm is shown to perform better than the state-of-the-art unsupervised algorithms previously proposed on panels introduced by the Flow Cytometry: Critical Assessment of Population Identification Methods (FlowCAP I) project. The algorithm is also applied to a T-cell panel proposed by the Human Immunology Project Consortium (HIPC) program; it also outperforms the best unsupervised open-source available algorithm while requiring the shortest computation time. Availability An R package named “cytometree” is available on the CRAN repository. Contact daniel.commenges@u-bordeaux.fr; rodolphe.thiebaut@u-bordeaux.fr Supplementary information Supplementary data are available.

[1]  Greg Finak,et al.  Merging Mixture Components for Cell Population Identification in Flow Cytometry , 2009, Adv. Bioinformatics.

[2]  G. Nolan,et al.  Automated Mapping of Phenotype Space with Single-Cell Data , 2016, Nature Methods.

[3]  Greg Finak,et al.  Critical assessment of automated flow cytometry data analysis techniques , 2013, Nature Methods.

[4]  Lucie Abeler-Dörner,et al.  flowLearn: fast and precise identification and quality checking of cell populations in flow cytometry , 2018, Bioinform..

[5]  Joel Dudley,et al.  Automated cell type discovery and classification through knowledge transfer , 2017, Bioinform..

[6]  Xi Zhao,et al.  CCAST: A Model-Based Gating Strategy to Isolate Homogeneous Subpopulations in a Heterogeneous Population of Single Cells , 2014, PLoS Comput. Biol..

[7]  Cliburn Chan,et al.  Hierarchical Modeling for Rare Event Detection and Cell Subset Alignment across Flow Cytometry Samples , 2013, PLoS Comput. Biol..

[8]  Yi Yao,et al.  Gating mass cytometry data by deep learning , 2016, bioRxiv.

[9]  R. Scheuermann,et al.  Elucidation of seventeen human peripheral blood B‐cell subsets and quantification of the tetanus response using a density‐based method for the automated identification of cell populations in multidimensional flow cytometry data , 2010, Cytometry. Part B, Clinical cytometry.

[10]  Greg Finak,et al.  COMPASS identifies T-cell subsets correlated with clinical outcomes , 2015, Nature Biotechnology.

[11]  Greg Finak,et al.  State‐of‐the‐Art in the Computational Analysis of Cytometry Data , 2015, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[12]  Raphael Gottardo,et al.  Automated gating of flow cytometry data via robust model‐based clustering , 2008, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[13]  Adrian E. Raftery,et al.  Normal Mixture Modelling for Model-Based Clustering,Classification, and Density Estimation , 2015 .

[14]  Raphael Gottardo,et al.  flowCL: ontology-based cell population labelling in flow cytometry , 2015, Bioinform..

[15]  Greg Finak,et al.  OpenCyto: An Open Source Infrastructure for Scalable, Robust, Reproducible, and Automated, End-to-End Flow Cytometry Data Analysis , 2014, PLoS Comput. Biol..

[16]  M P Wand,et al.  Automation in high‐content flow cytometry screening , 2009, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[17]  U. Wagner,et al.  Peripheral CD4CD8 Double Positive T Cells with a Distinct Helper Cytokine Profile Are Increased in Rheumatoid Arthritis , 2014, PloS one.

[18]  C. Chizzolini,et al.  CD4+ CD8+ double positive (DP) T cells in health and disease. , 2004, Autoimmunity reviews.

[19]  F. Zuckermann Extrathymic CD4/CD8 double positive T cells. , 1999, Veterinary immunology and immunopathology.

[20]  Y. Saeys,et al.  Computational flow cytometry: helping to make sense of high-dimensional immunology data , 2016, Nature Reviews Immunology.

[21]  Nima Aghaeepour,et al.  Flow Cytometry Bioinformatics , 2013, PLoS Comput. Biol..

[22]  D. Commenges,et al.  Estimating a difference of Kullback–Leibler risks using a normalized difference of AIC , 2008 .

[23]  Ronald M. Levy,et al.  Joint Modeling and Registration of Cell Populations in Cohorts of High-Dimensional Flow Cytometric Data , 2013, PloS one.

[24]  Cliburn Chan,et al.  Statistical mixture modeling for cell subtype identification in flow cytometry , 2008, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[25]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[26]  Raphael Gottardo,et al.  Computational resources for high-dimensional immune analysis from the Human Immunology Project Consortium , 2014, Nature Biotechnology.

[27]  Greg Finak,et al.  Identification and visualization of multidimensional antigen‐specific T‐cell populations in polychromatic cytometry data , 2015, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[28]  Greg Finak,et al.  flowDensity: reproducing manual gating of flow cytometry data by automated density-based cell population identification , 2015, Bioinform..

[29]  Greg Finak,et al.  Automated analysis of flow cytometry data comes of age , 2016, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[30]  Ryan R Brinkman,et al.  Rapid cell population identification in flow cytometry data , 2011, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[31]  J. Mesirov,et al.  Automated high-dimensional flow cytometric data analysis , 2009, Proceedings of the National Academy of Sciences.

[32]  J. P. McCoy,et al.  Standardizing Flow Cytometry Immunophenotyping Analysis from the Human ImmunoPhenotyping Consortium , 2016, Scientific Reports.

[33]  Arvind Gupta,et al.  Data reduction for spectral clustering to analyze high throughput flow cytometry data , 2010, BMC Bioinformatics.