Reverse-engineering flow-cytometry gating strategies for phenotypic labelling and high-performance cell sorting

Motivation: Recent flow and mass cytometers generate datasets of dimensions 20 to 40 and a million single cells. From these, many tools facilitate the discovery of new cell populations associated with diseases or physiology. These new cell populations require the identification of new gating strategies, but gating strategies become exponentially more difficult to optimize when dimensionality increases. To facilitate this step, we developed Hypergate, an algorithm which given a cell population of interest identifies a gating strategy optimized for high yield and purity. Results: Hypergate achieves higher yield and purity than human experts, Support Vector Machines and Random‐Forests on public datasets. We use it to revisit some established gating strategies for the identification of innate lymphoid cells, which identifies concise and efficient strategies that allow gating these cells with fewer parameters but higher yield and purity than the current standards. For phenotypic description, Hypergate's outputs are consistent with fields' knowledge and sparser than those from a competing method. Availability and implementation: Hypergate is implemented in R and available on CRAN. The source code is published at http://github.com/ebecht/hypergate under an Open Source Initiative‐compliant licence. Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Yannick Simoni,et al.  Human Innate Lymphoid Cell Subsets Possess Tissue-Type Based Heterogeneity in Phenotype and Frequency. , 2018, Immunity.

[2]  Y. Saeys,et al.  Computational flow cytometry: helping to make sense of high-dimensional immunology data , 2016, Nature Reviews Immunology.

[3]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[4]  Greg Finak,et al.  Critical assessment of automated flow cytometry data analysis techniques , 2013, Nature Methods.

[5]  Jonathan A. Rebhahn,et al.  SWIFT—Scalable Clustering for Automated Identification of Rare Cell Populations in Large, High-Dimensional Flow Cytometry Datasets, Part 2: Biological Evaluation , 2014, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[6]  Sean C. Bendall,et al.  Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with Prognosis , 2015, Cell.

[7]  Kirsten E Diggins,et al.  Characterizing cell subsets in heterogeneous tissues using marker enrichment modeling , 2016, Nature Methods.

[8]  Hergen Spits,et al.  Human innate lymphoid cells. , 2016, The Journal of allergy and clinical immunology.

[9]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[10]  Elmar Eisemann,et al.  Mass cytometry reveals innate lymphoid cell differentiation pathways in the human fetal intestine , 2018, The Journal of experimental medicine.

[11]  Y. Saeys,et al.  Computational methods for trajectory inference from single‐cell transcriptomics , 2016, European journal of immunology.

[12]  Yi Yao,et al.  Gating mass cytometry data by deep learning , 2016, bioRxiv.

[13]  G. Nolan,et al.  Automated Mapping of Phenotype Space with Single-Cell Data , 2016, Nature Methods.

[14]  Elaine Coustan-Smith,et al.  New markers for minimal residual disease detection in acute lymphoblastic leukemia. , 2011, Blood.

[15]  David Bryder,et al.  Frequency determination of rare populations by flow cytometry: A hematopoietic stem cell perspective , 2013, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[16]  Greg Finak,et al.  COMPASS identifies T-cell subsets correlated with clinical outcomes , 2015, Nature Biotechnology.

[17]  Mark D. Robinson,et al.  Comparison of Clustering Methods for High-Dimensional Single-Cell Flow and Mass Cytometry Data , 2016, bioRxiv.

[18]  M. Colonna,et al.  INNATE LYMPHOID CELLS Innate lymphoid cells : A new paradigm in immunology , 2018 .

[19]  Yang Cheng,et al.  Categorical Analysis of Human T Cell Heterogeneity with One-Dimensional Soli-Expression by Nonlinear Stochastic Embedding , 2016, The Journal of Immunology.

[20]  John C. Marioni,et al.  Testing for differential abundance in mass cytometry data , 2017, Nature Methods.

[21]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[22]  Josef Spidlen,et al.  FlowRepository: A resource of annotated flow cytometry datasets associated with peer‐reviewed publications , 2012, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[23]  Eirini Arvaniti,et al.  Sensitive detection of rare disease-associated cell subsets via representation learning , 2016 .