PARC: ultrafast and accurate clustering of phenotypic data of millions of single cells

Motivation New single-cell technologies continue to fuel the explosive growth in the scale of heterogeneous single-cell data. However, existing computational methods are inadequately scalable to large datasets and therefore cannot uncover the complex cellular heterogeneity. Results We introduce a highly scalable graph-based clustering algorithm PARC - phenotyping by accelerated refined community-partitioning – for ultralarge-scale, high-dimensional single-cell data (> 1 million cells). Using large single cell mass cytometry, RNA-seq and imaging-based biophysical data, we demonstrate that PARC consistently outperforms state-of-the-art clustering algorithms without sub-sampling of cells, including Phenograph, FlowSOM, and Flock, in terms of both speed and ability to robustly detect rare cell populations. For example, PARC can cluster a single cell data set of 1.1M cells within 13 minutes, compared to >2 hours to the next fastest graph-clustering algorithm, Phenograph. Our work presents a scalable algorithm to cope with increasingly large-scale single-cell analysis. Availability and Implementation https://github.com/ShobiStassen/PARC

[1]  Qionghai Dai,et al.  Massive single-cell RNA-seq analysis and imputation via deep learning , 2018, bioRxiv.

[2]  Eric M. Morrow,et al.  rax, Hes1, and notch1 Promote the Formation of Müller Glia by Postnatal Retinal Progenitor Cells , 2000, Neuron.

[3]  Jerome H. Kim,et al.  Distinct gene expression profiles associated with the susceptibility of pathogen-specific CD4+ T cells to HIV-1 infection , 2012, Retrovirology.

[4]  V A Traag,et al.  Narrow scope for resolution-limit-free community detection. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  Fabian J Theis,et al.  The Human Cell Atlas , 2017, bioRxiv.

[6]  Galina A. Erikson,et al.  The Aging Astrocyte Transcriptome from Multiple Regions of the Mouse Brain , 2018, Cell reports.

[7]  Ryan R Brinkman,et al.  Rapid cell population identification in flow cytometry data , 2011, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[8]  Fabian J Theis,et al.  SCANPY: large-scale single-cell gene expression data analysis , 2018, Genome Biology.

[9]  E. Andrès,et al.  CD56bright natural killer (NK) cells: an important NK cell subset , 2009, Immunology.

[10]  Jonathan A. Rebhahn,et al.  SWIFT—Scalable Clustering for Automated Identification of Rare Cell Populations in Large, High-Dimensional Flow Cytometry Datasets, Part 2: Biological Evaluation , 2014, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[11]  Hayden Kwok-Hay So,et al.  Multi‐ATOM: Ultrahigh‐throughput single‐cell quantitative phase imaging with subcellular resolution , 2019, Journal of biophotonics.

[12]  F H Bach,et al.  Characterization of a novel gene (NKG7) on human chromosome 19 that is expressed in natural killer cells and T cells. , 1993, Human immunology.

[13]  Pinhas Girshovitz,et al.  Generalized cell morphological parameters based on interferometric phase microscopy and their application to cell life cycle characterization , 2012, Biomedical optics express.

[14]  Tohru Fujiwara,et al.  Inhibition of human primary megakaryocyte differentiation by anagrelide: a gene expression profiling analysis , 2016, International Journal of Hematology.

[15]  Wing-Cheong Wong,et al.  nonclassical human monocyte subsets Gene expression profiling reveals the defining features of the classical , 2011 .

[16]  D. Ingram,et al.  Clinical significance of monocyte heterogeneity , 2015, Clinical and Translational Medicine.

[17]  N. McGovern,et al.  Human dendritic cell subsets , 2013, Immunology.

[18]  John M Bennett,et al.  Morphological evaluation of monocytes and their precursors , 2009, Haematologica.

[19]  David Bryder,et al.  Frequency determination of rare populations by flow cytometry: A hematopoietic stem cell perspective , 2013, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[20]  J. Aguillón,et al.  Gene Expression Profiling of Human Monocyte-derived Dendritic Cells – Searching for Molecular Regulators of Tolerogenicity , 2015, Front. Immunol..

[21]  S. Herculano‐Houzel,et al.  Changing numbers of neuronal and non-neuronal cells underlie postnatal brain growth in the rat , 2009, Proceedings of the National Academy of Sciences.

[22]  Sanghamitra Bandyopadhyay,et al.  dropClust: Efficient clustering of ultra-large scRNA-seq data , 2017 .

[23]  G. Nolan,et al.  Mass Cytometry: Single Cells, Many Features , 2016, Cell.

[24]  Sean C. Bendall,et al.  Extracting a Cellular Hierarchy from High-dimensional Cytometry Data with SPADE , 2011, Nature Biotechnology.

[25]  Lars E. Borm,et al.  Molecular Architecture of the Mouse Nervous System , 2018, Cell.

[26]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[27]  Julien Prados,et al.  Transcriptomic and anatomic parcellation of 5-HT3AR expressing cortical interneuron subtypes revealed by single-cell RNA sequencing , 2017, Nature Communications.

[28]  P. Halloran,et al.  The Transcriptome of Human Cytotoxic T Cells: Similarities and Disparities Among Allostimulated CD4+ CTL, CD8+ CTL and NK cells , 2008, American journal of transplantation : official journal of the American Society of Transplantation and the American Society of Transplant Surgeons.

[29]  Yury A. Malkov,et al.  Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  C. Figdor,et al.  IL-4 and IL-13 alter plasmacytoid dendritic cell responsiveness to CpG DNA and herpes simplex virus-1. , 2011, The Journal of investigative dermatology.

[31]  Hayden Kwok-Hay So,et al.  Ultra-large-scale single-cell quantitative phase imaging , 2018 .

[32]  Qionghai Dai,et al.  Scalable analysis of cell-type composition from single-cell transcriptomics using deep recurrent learning , 2019, Nature Methods.

[33]  Lia Chappell,et al.  Single-Cell (Multi)omics Technologies. , 2018, Annual review of genomics and human genetics.

[34]  João Moura,et al.  Chemokine Receptor Expression on Normal Blood CD56+ NK-Cells Elucidates Cell Partners That Comigrate during the Innate and Adaptive Immune Responses and Identifies a Transitional NK-Cell Population , 2015, Journal of immunology research.

[35]  John E. Chandler,et al.  Label-free imaging of the native, living cellular nanoarchitecture using partial-wave spectroscopic microscopy , 2016, Proceedings of the National Academy of Sciences.

[36]  Lawrence Steinman,et al.  Nonclassical monocytes: are they the next therapeutic targets in multiple sclerosis? , 2018, Immunology and cell biology.

[37]  A. Rudensky,et al.  Regulatory T cells and Foxp3 , 2011, Immunological reviews.

[38]  Eric J. Kunkel,et al.  CCR7 Expression and Memory T Cell Diversity in Humans1 , 2001, The Journal of Immunology.

[39]  Sean C. Bendall,et al.  Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with Prognosis , 2015, Cell.

[40]  Stefan Steinerberger,et al.  Fast Interpolation-based t-SNE for Improved Visualization of Single-Cell RNA-Seq Data , 2017, Nature Methods.

[41]  Lassi Paavolainen,et al.  Data-analysis strategies for image-based cell profiling , 2017, Nature Methods.

[42]  Anne E Carpenter,et al.  Label-free cell cycle analysis for high-throughput imaging flow cytometry , 2016, Nature Communications.

[43]  David M Frim,et al.  Olig1 is expressed in human oligodendrocytes during maturation and regeneration , 2011, Glia.

[44]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[45]  Wing-Cheong Wong,et al.  Gene expression profiling reveals the defining features of the classical, intermediate, and nonclassical human monocyte subsets. , 2011, Blood.

[46]  B. Samten,et al.  CD52 as both a marker and an effector molecule of T cells with regulatory action: Identification of novel regulatory T cells , 2013, Cellular and Molecular Immunology.

[47]  Xiangyue Zhang,et al.  A distinct subset of plasmacytoid dendritic cells induces activation and differentiation of B and T lymphocytes , 2017, Proceedings of the National Academy of Sciences.

[48]  Sandrine Lévêque-Fort,et al.  Fast label-free cytoskeletal network imaging in living mammalian cells. , 2014, Biophysical journal.

[49]  Charlotte Soneson,et al.  A systematic performance evaluation of clustering methods for single-cell RNA-seq data , 2018, F1000Research.

[50]  Christof Koch,et al.  Adult Mouse Cortical Cell Taxonomy by Single Cell Transcriptomics , 2016, Nature Neuroscience.

[51]  Y. Senis,et al.  TREM-like transcript 1: a more sensitive marker of platelet activation than P-selectin in humans and mice. , 2018, Blood advances.

[52]  A. Lusis,et al.  Considerations for the design of omics studies , 2017 .

[53]  Joshua W. K. Ho,et al.  CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data , 2016, Genome Biology.

[54]  G. Nolan,et al.  Automated Mapping of Phenotype Space with Single-Cell Data , 2016, Nature Methods.

[55]  Piet Demeester,et al.  FlowSOM: Using self‐organizing maps for visualization and interpretation of cytometry data , 2015, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[56]  Quanxin Wang,et al.  Multiple Distinct Subtypes of GABAergic Neurons in Mouse Visual Cortex Identified by Triple Immunostaining , 2007, Frontiers in neuroanatomy.

[57]  Sanghamitra Bandyopadhyay,et al.  dropClust: efficient clustering of ultra-large scRNA-seq data , 2017, bioRxiv.

[58]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[59]  C. Carter-Su,et al.  JAKs, Stats, and CK2? , 2011, Blood.

[60]  C. Beasley,et al.  GABAergic neuronal subtypes in the human frontal cortex — development and deficits in schizophrenia , 2001, Journal of Chemical Neuroanatomy.

[61]  Kevin K Tsia,et al.  A high‐throughput all‐optical laser‐scanning imaging flow cytometer with biomolecular specificity and subcellular resolution , 2018, Journal of biophotonics.

[62]  Monika Liguz-Lecznar,et al.  Vesicular glutamate transporters (VGLUTs): the three musketeers of glutamatergic system. , 2007, Acta neurobiologiae experimentalis.

[63]  Mark D. Robinson,et al.  Comparison of Clustering Methods for High-Dimensional Single-Cell Flow and Mass Cytometry Data , 2016, bioRxiv.

[64]  Y. Yanagawa,et al.  The Fraction of Cortical GABAergic Neurons Is Constant from Near the Start of Cortical Neurogenesis to Adulthood , 2012, The Journal of Neuroscience.

[65]  Laleh Haghverdi,et al.  Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors , 2018, Nature Biotechnology.

[66]  Robert F. Hevner,et al.  Transcription factors in glutamatergic neurogenesis: Conserved programs in neocortex, cerebellum, and adult hippocampus , 2006, Neuroscience Research.

[67]  A. Berrebi,et al.  CD160: a unique activating NK cell receptor. , 2011, Immunology letters.

[68]  Inge Jonassen,et al.  Characterization of Early Stages of Human B Cell Development by Gene Expression Profiling1 , 2007, The Journal of Immunology.

[69]  Nupur Bhatnagar,et al.  Loss of CCR7 Expression on CD56bright NK Cells Is Associated with a CD56dimCD16+ NK Cell-Like Phenotype and Correlates with HIV Viral Load , 2012, PloS one.

[70]  R. Scheuermann,et al.  Elucidation of seventeen human peripheral blood B‐cell subsets and quantification of the tetanus response using a density‐based method for the automated identification of cell populations in multidimensional flow cytometry data , 2010, Cytometry. Part B, Clinical cytometry.

[71]  Vincent A. Traag,et al.  From Louvain to Leiden: guaranteeing well-connected communities , 2018, Scientific Reports.

[72]  W. Gorczyca,et al.  Immunophenotypic pattern of myeloid populations by flow cytometry analysis. , 2011, Methods in cell biology.

[73]  Chung-Yuan Huang,et al.  A community detection algorithm using network topologies and rule-based hierarchical arc-merging strategies , 2017, PloS one.

[74]  Hayden Kwok-Hay So,et al.  Quantitative Phase Imaging Flow Cytometry for Ultra‐Large‐Scale Single‐Cell Biophysical Phenotyping , 2019, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[75]  Anis Larbi,et al.  The pro-inflammatory phenotype of the human non-classical monocyte subset is attributed to senescence , 2018, Cell Death & Disease.