Recent Advances in Computer-Assisted Algorithms for Cell Subtype Identification of Cytometry Data

The progress in the field of high-dimensional cytometry has greatly increased the number of markers that can be simultaneously analyzed producing datasets with large numbers of parameters. Traditional biaxial manual gating might not be optimal for such datasets. To overcome this, a large number of automated tools have been developed to aid with cellular clustering of multi-dimensional datasets. Here were review two large categories of such tools; unsupervised and supervised clustering tools. After a thorough review of the popularity and use of each of the available unsupervised clustering tools, we focus on the top six tools to discuss their advantages and limitations. Furthermore, we employ a publicly available dataset to directly compare the usability, speed, and relative effectiveness of the available unsupervised and supervised tools. Finally, we discuss the current challenges for existing methods and future direction for the new generation of cell type identification approaches.

[1]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[2]  Raphael Gottardo,et al.  cytometree: A binary tree algorithm for automatic gating in cytometry analysis , 2018, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[3]  Thanh D Do,et al.  Categorizing Cells on the Basis of their Chemical Profiles: Progress in Single-Cell Mass Spectrometry , 2017, Journal of the American Chemical Society.

[4]  Joel Dudley,et al.  Automated cell type discovery and classification through knowledge transfer , 2017, Bioinform..

[5]  Xi Zhao,et al.  CCAST: A Model-Based Gating Strategy to Isolate Homogeneous Subpopulations in a Heterogeneous Population of Single Cells , 2014, PLoS Comput. Biol..

[6]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[7]  Sean C. Bendall,et al.  Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with Prognosis , 2015, Cell.

[8]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[9]  M Roederer,et al.  Spectral compensation for flow cytometry: visualization artifacts, limitations, and caveats. , 2001, Cytometry.

[10]  Hao Chen,et al.  Cytofkit: A Bioconductor Package for an Integrated Mass Cytometry Data Analysis Pipeline , 2016, PLoS Comput. Biol..

[11]  Thomas Häupl,et al.  immunoClust—An automated analysis pipeline for the identification of immunophenotypic signatures in high‐dimensional cytometric datasets , 2015, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[12]  Abigail K Kimball,et al.  A Beginner’s Guide to Analyzing and Visualizing Mass Cytometry Data , 2018, The Journal of Immunology.

[13]  Andreas Zell,et al.  Simulation neuronaler Netze , 1994 .

[14]  Greg Finak,et al.  Merging Mixture Components for Cell Population Identification in Flow Cytometry , 2009, Adv. Bioinformatics.

[15]  R F Murphy,et al.  A proposal for a flow cytometric data file standard. , 1984, Cytometry.

[16]  James Bailey,et al.  Information theoretic measures for clusterings comparison: is a correction for chance necessary? , 2009, ICML '09.

[17]  G. Nolan,et al.  Automated Mapping of Phenotype Space with Single-Cell Data , 2016, Nature Methods.

[18]  Michael Poidinger,et al.  High-dimensional analysis of the murine myeloid cell system , 2014, Nature Immunology.

[19]  Piet Demeester,et al.  FlowSOM: Using self‐organizing maps for visualization and interpretation of cytometry data , 2015, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[20]  Yvan Saeys,et al.  A comparison of single-cell trajectory inference methods , 2019, Nature Biotechnology.

[21]  Camille Roth,et al.  Natural Scales in Geographical Patterns , 2017, Scientific Reports.

[22]  Mark D. Robinson,et al.  Comparison of Clustering Methods for High-Dimensional Single-Cell Flow and Mass Cytometry Data , 2016, bioRxiv.

[23]  Ryan R Brinkman,et al.  Rapid cell population identification in flow cytometry data , 2011, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[24]  Anne Condon,et al.  densityCut: an efficient and versatile topological approach for automatic clustering of biological data , 2016, Bioinform..

[25]  John C. Marioni,et al.  Testing for differential abundance in mass cytometry data , 2017, Nature Methods.

[26]  Nikesh Kotecha,et al.  Web‐Based Analysis and Publication of Flow Cytometry Experiments , 2010, Current protocols in cytometry.

[27]  Jonathan A. Rebhahn,et al.  SWIFT—Scalable Clustering for Automated Identification of Rare Cell Populations in Large, High-Dimensional Flow Cytometry Datasets, Part 2: Biological Evaluation , 2014, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[28]  Kylie M. Price,et al.  Panel Design and Optimization for High‐Dimensional Immunophenotyping Assays Using Spectral Flow Cytometry , 2020, Current protocols in cytometry.

[29]  Holden T Maecker,et al.  Algorithmic Tools for Mining High-Dimensional Cytometry Data , 2015, The Journal of Immunology.

[30]  C B Bagwell,et al.  Fluorescence Spectral Overlap Compensation for Any Number of Flow Cytometry Parameters , 1993, Annals of the New York Academy of Sciences.

[31]  Y. Saeys,et al.  Computational flow cytometry: helping to make sense of high-dimensional immunology data , 2016, Nature Reviews Immunology.

[32]  Sean C. Bendall,et al.  Immune monitoring using mass cytometry and related high-dimensional imaging approaches , 2019, Nature Reviews Rheumatology.

[33]  Florian Mair Gate to the Future: Computational Analysis of Immunophenotyping Data , 2019, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[34]  Dawn M. E. Bowdish,et al.  An Introduction to Automated Flow Cytometry Gating Tools and Their Implementation , 2015, Front. Immunol..

[35]  Yi Yao,et al.  Gating mass cytometry data by deep learning , 2016, bioRxiv.

[36]  R. Scheuermann,et al.  Elucidation of seventeen human peripheral blood B‐cell subsets and quantification of the tetanus response using a density‐based method for the automated identification of cell populations in multidimensional flow cytometry data , 2010, Cytometry. Part B, Clinical cytometry.

[37]  Thomas Höllt,et al.  Predicting Cell Populations in Single Cell Mass Cytometry Data , 2019, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[38]  Fabian J. Theis,et al.  Meeting the Challenges of High-Dimensional Single-Cell Data Analysis in Immunology , 2019, Front. Immunol..

[39]  Axel Theorell,et al.  Determination of essential phenotypic elements of clusters in high-dimensional entities—DEPECHE , 2018, bioRxiv.

[40]  Peng Qiu,et al.  Toward deterministic and semiautomated SPADE analysis , 2017, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[41]  M. Cugmas,et al.  On comparing partitions , 2015 .

[42]  Xiaoxin Ye,et al.  Ultrafast clustering of single-cell flow cytometry data using FlowGrid , 2018 .

[43]  Mario Roederer,et al.  Background fluorescence and spreading error are major contributors of variability in high‐dimensional flow cytometry data visualization by t‐distributed stochastic neighboring embedding , 2018, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[44]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[45]  J. Mesirov,et al.  Automated high-dimensional flow cytometric data analysis , 2009, Proceedings of the National Academy of Sciences.

[46]  A flow cytometry revolution , 2011, Nature Methods.

[47]  Shadi Toghi Eshghi,et al.  Quantitative Comparison of Conventional and t-SNE-guided Gating Analyses , 2019, Front. Immunol..

[48]  Mariana Valente,et al.  Spectral Cytometry Has Unique Properties Allowing Multicolor Analysis of Cell Suspensions Isolated from Solid Tissues , 2016, PloS one.

[49]  Kylie M. Price,et al.  Design and Optimization Protocol for High-Dimensional Immunophenotyping Assays using Spectral Flow Cytometry , 2019, bioRxiv.

[50]  Xiaowei Wang,et al.  Scalable multi-sample single-cell data analysis by Partition-Assisted Clustering and Multiple Alignments of Networks , 2017, bioRxiv.

[51]  Stefan Steinerberger,et al.  Fast Interpolation-based t-SNE for Improved Visualization of Single-Cell RNA-Seq Data , 2017, Nature Methods.

[52]  M. Nourani,et al.  Single and multi-subject clustering of flow cytometry data for cell-type identification and anomaly detection , 2016, BMC Medical Genomics.

[53]  Ann B. Lee,et al.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[54]  B. Becher,et al.  The end of gating? An introduction to automated analysis of high dimensional cytometry data , 2016, European journal of immunology.

[55]  Y. Saeys,et al.  Computational methods for trajectory inference from single‐cell transcriptomics , 2016, European journal of immunology.

[56]  Mark M. Davis,et al.  Automatic Classification of Cellular Expression by Nonlinear Stochastic Embedding (ACCENSE) , 2013, Proceedings of the National Academy of Sciences.

[57]  B. Becher,et al.  CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets , 2017, F1000Research.

[58]  Lucie Abeler-Dörner,et al.  flowLearn: fast and precise identification and quality checking of cell populations in flow cytometry , 2018, Bioinform..

[59]  Raphael Gottardo,et al.  flowCL: ontology-based cell population labelling in flow cytometry , 2015, Bioinform..

[60]  Greg Finak,et al.  OpenCyto: An Open Source Infrastructure for Scalable, Robust, Reproducible, and Automated, End-to-End Flow Cytometry Data Analysis , 2014, PLoS Comput. Biol..

[61]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[62]  S. Sealfon,et al.  flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding , 2012, Bioinform..

[63]  Eirini Arvaniti,et al.  Sensitive detection of rare disease-associated cell subsets via representation learning , 2016, Nature Communications.

[64]  Arvind Gupta,et al.  Data reduction for spectral clustering to analyze high throughput flow cytometry data , 2010, BMC Bioinformatics.

[65]  R. Tibshirani,et al.  Automated identification of stratifying signatures in cellular subpopulations , 2014, Proceedings of the National Academy of Sciences.

[66]  Mark D. Robinson,et al.  diffcyt: Differential discovery in high-dimensional cytometry via high-resolution clustering , 2018, Communications Biology.

[67]  Sean C. Bendall,et al.  A deep profiler's guide to cytometry. , 2012, Trends in immunology.

[68]  Mehrdad Nourani,et al.  Flow-SNE: A New Approach for Flow Cytometry Clustering and Visualization , 2015, 2015 International Conference on Healthcare Informatics.

[69]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[70]  Noah Zimmerman,et al.  Automatic Clustering of Flow Cytometry Data with Density-Based Merging , 2009, Adv. Bioinformatics.

[71]  Teresa H. Y. Meng,et al.  CytoSPADE: high-performance analysis and visualization of high-dimensional cytometry data , 2012, Bioinform..

[72]  Raphael Gottardo,et al.  flowClust: a Bioconductor package for automated gating of flow cytometry data , 2009, BMC Bioinformatics.

[73]  Thomas Höllt,et al.  Predicting cell types in single cell mass cytometry data , 2018 .

[74]  Sean C. Bendall,et al.  Extracting a Cellular Hierarchy from High-dimensional Cytometry Data with SPADE , 2011, Nature Biotechnology.

[75]  Yvan Saeys,et al.  Computational approaches for high‐throughput single‐cell data analysis , 2018, The FEBS journal.

[76]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[77]  Yu Qian,et al.  Bayesian Trees for Automated Cytometry Data Analysis , 2018, bioRxiv.

[78]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[79]  Lai Guan Ng,et al.  Dimensionality reduction for visualizing single-cell data using UMAP , 2018, Nature Biotechnology.