dropClust: efficient clustering of ultra-large scRNA-seq data

Abstract Droplet based single cell transcriptomics has recently enabled parallel screening of tens of thousands of single cells. Clustering methods that scale for such high dimensional data without compromising accuracy are scarce. We exploit Locality Sensitive Hashing, an approximate nearest neighbour search technique to develop a de novo clustering algorithm for large-scale single cell data. On a number of real datasets, dropClust outperformed the existing best practice methods in terms of execution time, clustering accuracy and detectability of minor cell sub-types.

[1]  Masafumi Takiguchi,et al.  Phenotypic classification of human CD4+ T cell subsets and their differentiation. , 2008, International immunology.

[2]  L. J. K. Wee,et al.  Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors , 2017, Nature Genetics.

[3]  Malin Lindstedt,et al.  CD27− CD4+ memory T cells define a differentiated memory population at both the functional and transcriptional levels , 2004, Immunology.

[4]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[5]  P. Scott,et al.  IL-7 Receptor Expression Provides the Potential for Long-Term Survival of Both CD62Lhigh Central Memory T Cells and Th1 Effector Cells during Leishmania major Infection1 , 2009, The Journal of Immunology.

[6]  Peter Widmayer,et al.  Genevestigator V3: A Reference Expression Database for the Meta-Analysis of Transcriptomes , 2008, Adv. Bioinformatics.

[7]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[8]  Robert C. Wolpert,et al.  A Review of the , 1985 .

[9]  A. Regev,et al.  Revealing the vectors of cellular identity with single-cell genomics , 2016, Nature Biotechnology.

[10]  B. Passlick,et al.  Identification and characterization of a novel monocyte subpopulation in human peripheral blood. , 1989, Blood.

[11]  Ujjwal Maulik,et al.  Reformulated Kemeny Optimal Aggregation with Application in Consensus Ranking of microRNA Targets , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[13]  W. Vainchenker,et al.  Expression of CD34 and platelet glycoproteins during human megakaryocytic differentiation. , 1992, Blood.

[14]  Shimon Sakaguchi,et al.  Foxp3-dependent and -independent molecules specific for CD25+CD4+ natural regulatory T cells revealed by DNA microarray analysis. , 2006, International immunology.

[15]  Eva M García-Cuesta,et al.  Natural killer cell hyporesponsiveness and impaired development in a CD247-deficient patient. , 2016, The Journal of allergy and clinical immunology.

[16]  B. Kempkes,et al.  Macrophage Polarisation: an Immunohistochemical Approach for Identifying M1 and M2 Macrophages , 2013, PloS one.

[17]  S. Jonjić,et al.  Mouse Hobit is a homolog of the transcriptional repressor Blimp-1 that regulates NKT cell effector differentiation , 2012, Nature Immunology.

[18]  R. Mehran,et al.  Characterization of Myeloid and Plasmacytoid Dendritic Cells in Human Lung1 , 2006, The Journal of Immunology.

[19]  Bernd Jahrsdörfer,et al.  Granzyme B produced by human plasmacytoid dendritic cells suppresses T-cell expansion. , 2009, Blood.

[20]  N. Neff,et al.  Reconstructing lineage hierarchies of the distal lung epithelium using single cell RNA-seq , 2014, Nature.

[21]  Tohru Fujiwara,et al.  Inhibition of human primary megakaryocyte differentiation by anagrelide: a gene expression profiling analysis , 2016, International Journal of Hematology.

[22]  Hiroshi Kawamoto,et al.  Commitment to natural killer cells requires the helix–loop–helix inhibitor Id2 , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[23]  P. Chu,et al.  CD79: a review. , 2001, Applied immunohistochemistry & molecular morphology : AIMM.

[24]  Guocheng Yuan,et al.  GiniClust: detecting rare cell types from single-cell gene expression data with Gini index , 2016, Genome Biology.

[25]  Bin Zhang,et al.  Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R , 2008, Bioinform..

[26]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[27]  Hans Clevers,et al.  Single-cell messenger RNA sequencing reveals rare intestinal cell types , 2015, Nature.

[28]  W. Luttmann,et al.  Differential expression of the granzymes A, K and M and perforin in human peripheral blood lymphocytes. , 2005, International immunology.

[29]  Hermann Wagner,et al.  Selective expression of IL-7 receptor on memory T cells identifies early CD40L-dependent generation of distinct CD8+ memory T cell subsets. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Joseph A. Lorenzo,et al.  The Effects of Immune Cell Products (Cytokines and Hematopoietic Cell Growth Factors) on Bone Cells , 2016 .

[31]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[32]  B. Passlick,et al.  The monoclonal antimonocyte antibody My4 stains B lymphocytes and two distinct monocyte subsets in human peripheral blood. , 1988, Hybridoma.

[33]  Joseph A. Lorenzo,et al.  7 – The Effects of Immune Cell Products (Cytokines and Hematopoietic Cell Growth Factors) on Bone Cells , 2011 .

[34]  F H Bach,et al.  Characterization of a novel gene (NKG7) on human chromosome 19 that is expressed in natural killer cells and T cells. , 1993, Human immunology.

[35]  A. Regev,et al.  Scaling single-cell genomics from phenomenology to mechanism , 2017, Nature.

[36]  Nicholas K. Brown,et al.  CD160 is essential for NK-mediated IFN-γ production , 2015, The Journal of experimental medicine.

[37]  Evan Z. Macosko,et al.  A Molecular Census of Arcuate Hypothalamus and Median Eminence Cell Types , 2017, Nature Neuroscience.

[38]  Morten P Oksvold,et al.  Expression of B-cell surface antigens in subpopulations of exosomes released from B-cell lymphoma cells. , 2014, Clinical therapeutics.

[39]  L. Harrison,et al.  T cell regulation mediated by interaction of soluble CD52 with the inhibitory receptor Siglec-10 , 2013, Nature Immunology.

[40]  Daniel C. Douek,et al.  CD127 and CD25 Expression Defines CD4+ T Cell Subsets That Are Differentially Depleted during HIV Infection 1 , 2008, The Journal of Immunology.

[41]  J. Goyette,et al.  Mast Cell and Monocyte Recruitment by S100A12 and Its Hinge Domain* , 2008, Journal of Biological Chemistry.

[42]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[43]  Yang Xiang,et al.  Generalized Simulated Annealing for Global Optimization: The GenSA Package , 2013, R J..

[44]  Sing Sing Way,et al.  Regulatory T cell memory , 2015, Nature Reviews Immunology.

[45]  A. Berrebi,et al.  CD160: a unique activating NK cell receptor. , 2011, Immunology letters.

[46]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[47]  Kazuyuki Ogawa,et al.  Granulysin in human serum as a marker of cell‐mediated immunity , 2003, European journal of immunology.

[48]  Mortimer Poncz,et al.  Megakaryocytes Exchange Significant Levels of Their Alpha-Granular PF4 with Their Environment , 2014 .

[49]  G. Zimmerman,et al.  Platelet-activating Factor Acetylhydrolases* , 1997, The Journal of Biological Chemistry.

[50]  N. McGovern,et al.  Human dendritic cell subsets , 2013, Immunology.

[51]  Woo-Yong Lee,et al.  Invariant natural killer T cells act as an extravascular cytotoxic barrier for joint-invading Lyme Borrelia , 2014, Proceedings of the National Academy of Sciences.

[52]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[53]  Mayank Bawa,et al.  LSH forest: self-tuning indexes for similarity search , 2005, WWW '05.

[54]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[55]  Sean C. Bendall,et al.  Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with Prognosis , 2015, Cell.

[56]  Katsumi Eguchi,et al.  Granzyme B and natural killer (NK) cell death , 2005, Modern rheumatology.

[57]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[58]  Kevin Ramirez,et al.  Transcriptional regulation of natural killer cell development , 2010 .

[59]  Chen Xu,et al.  Identification of cell types from single-cell transcriptomes using a novel clustering method , 2015, Bioinform..

[60]  Slobodan Petrovic,et al.  A Comparison Between the Silhouette Index and the Davies-Bouldin Index in Labelling IDS Clusters , 2006 .

[61]  Miriam Merad,et al.  The dendritic cell lineage: ontogeny and function of dendritic cells and their subsets in the steady state and the inflamed setting. , 2013, Annual review of immunology.