Predicting cell types in single cell mass cytometry data

Motivation Mass cytometry (CyTOF) is a valuable technology for high-dimensional analysis at the single cell level. Identification of different cell populations is an important task during the data analysis. Many clustering tools can perform this task, however, they are time consuming, often involve a manual step, and lack reproducibility when new data is included in the analysis. Learning cell types from an annotated set of cells solves these problems. However, currently available mass cytometry classifiers are either complex, dependent on prior knowledge of the cell type markers during the learning process, or can only identify canonical cell types. Results We propose to use a Linear Discriminant Analysis (LDA) classifier to automatically identify cell populations in CyTOF data. LDA shows comparable results with two state-of-the-art algorithms on four benchmark datasets and also outperforms a non-linear classifier such as the k-nearest neighbour classifier. To illustrate its scalability to large datasets with deeply annotated cell subtypes, we apply LDA to a dataset of ~3.5 million cells representing 57 cell types. LDA has high performance on abundant cell types as well as the majority of rare cell types, and provides accurate estimates of cell type frequencies. Further incorporating a rejection option, based on the estimated posterior probabilities, allows LDA to identify cell types that were not encountered during training. Altogether, reproducible prediction of cell type compositions using LDA opens up possibilities to analyse large cohort studies based on mass cytometry data. Availability Implementation is available on GitHub (https://github.com/tabdelaal/CyTOF-Linear-Classifier). Contact a.mahfouz@lumc.nl

[1]  M. Mearin,et al.  Mass Cytometry of the Human Mucosal Immune System Identifies Tissue- and Disease-Associated Immune Subsets. , 2016, Immunity.

[2]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[3]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[4]  Lucie Abeler-Dörner,et al.  flowLearn: fast and precise identification and quality checking of cell populations in flow cytometry , 2018, Bioinform..

[5]  Elmar Eisemann,et al.  Hierarchical Stochastic Neighbor Embedding , 2016, Comput. Graph. Forum.

[6]  G. Nolan,et al.  Automated Mapping of Phenotype Space with Single-Cell Data , 2016, Nature Methods.

[7]  Michael B. Stadler,et al.  An Immune Atlas of Clear Cell Renal Cell Carcinoma , 2017, Cell.

[8]  Yu Qian,et al.  Mapping cell populations in flow cytometry data for cross‐sample comparison using the Friedman–Rafsky test statistic as a distance measure , 2015, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[9]  Yi Yao,et al.  Gating mass cytometry data by deep learning , 2016, bioRxiv.

[10]  Elmar Eisemann,et al.  Visual analysis of mass cytometry data by hierarchical stochastic neighbour embedding reveals rare cell types , 2017, Nature Communications.

[11]  O. Ornatsky,et al.  Mass cytometry: technique for real time single cell multitarget immunoassay based on inductively coupled plasma time-of-flight mass spectrometry. , 2009, Analytical chemistry.

[12]  Elmar Eisemann,et al.  Approximated and User Steerable tSNE for Progressive Visual Analytics , 2015, IEEE Transactions on Visualization and Computer Graphics.

[13]  E. Newell,et al.  Mass cytometry: blessed with the curse of dimensionality , 2016, Nature Immunology.

[14]  G. Nolan,et al.  Mass Cytometry: Single Cells, Many Features , 2016, Cell.

[15]  Sean C. Bendall,et al.  Extracting a Cellular Hierarchy from High-dimensional Cytometry Data with SPADE , 2011, Nature Biotechnology.

[16]  Sean C. Bendall,et al.  Cytometry by time-of-flight shows combinatorial cytokine expression and virus-specific cell niches within a continuum of CD8+ T cell phenotypes. , 2012, Immunity.

[17]  Mark M Davis,et al.  Combinatorial tetramer staining and mass cytometry analysis facilitate T-cell epitope mapping and characterization , 2013, Nature Biotechnology.

[18]  Elmar Eisemann,et al.  Cytosplore: Interactive Immune Cell Phenotyping for Large Single‐Cell Datasets , 2016, Comput. Graph. Forum.

[19]  Sean C. Bendall,et al.  viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia , 2013, Nature Biotechnology.

[20]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Sean C. Bendall,et al.  Single-Cell Mass Cytometry of Differential Immune and Drug Responses Across a Human Hematopoietic Continuum , 2011, Science.

[22]  A. Bhardwaj,et al.  In situ click chemistry generation of cyclooxygenase-2 inhibitors , 2017, Nature Communications.

[23]  Joel Dudley,et al.  Automated cell type discovery and classification through knowledge transfer , 2017, Bioinform..

[24]  Sean C. Bendall,et al.  Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with Prognosis , 2015, Cell.