Determination of essential phenotypic elements of clusters in high-dimensional entities—DEPECHE

Technological advances have facilitated an exponential increase in the amount of information that can be derived from single cells, necessitating new computational tools that can make such highly complex data interpretable. Here, we introduce DEPECHE, a rapid, parameter free, sparse k-means-based algorithm for clustering of multi- and megavariate single-cell data. In a number of computational benchmarks aimed at evaluating the capacity to form biologically relevant clusters, including flow/mass-cytometry and single cell RNA sequencing data sets with manually curated gold standard solutions, DEPECHE clusters as well or better than the currently available best performing clustering algorithms. However, the main advantage of DEPECHE, compared to the state-of-the-art, is its unique ability to enhance interpretability of the formed clusters, in that it only retains variables relevant for cluster separation, thereby facilitating computational efficient analyses as well as understanding of complex datasets. DEPECHE is implemented in the open source R package DepecheR currently available at github.com/Theorell/DepecheR.

[1]  Andrew E. Jaffe,et al.  Bioinformatics Applications Note Gene Expression the Sva Package for Removing Batch Effects and Other Unwanted Variation in High-throughput Experiments , 2022 .

[2]  Christopher Yau,et al.  pcaReduce: hierarchical clustering of single cell transcriptional profiles , 2015, BMC Bioinformatics.

[3]  S. Sealfon,et al.  flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding , 2012, Bioinform..

[4]  Arvind Gupta,et al.  Data reduction for spectral clustering to analyze high throughput flow cytometry data , 2010, BMC Bioinformatics.

[5]  Ali Bashashati,et al.  A Survey of Flow Cytometry Data Analysis Methods , 2009, Adv. Bioinformatics.

[6]  Åsa K. Björklund,et al.  The heterogeneity of human CD127+ innate lymphoid cells revealed by single-cell RNA sequencing , 2016, Nature Immunology.

[7]  Sean C. Bendall,et al.  Single-Cell Mass Cytometry of Differential Immune and Drug Responses Across a Human Hematopoietic Continuum , 2011, Science.

[8]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[9]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[10]  Robert Tibshirani,et al.  A Framework for Feature Selection in Clustering , 2010, Journal of the American Statistical Association.

[11]  M. Cugmas,et al.  On comparing partitions , 2015 .

[12]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[13]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[14]  Thomas Häupl,et al.  immunoClust—An automated analysis pipeline for the identification of immunophenotypic signatures in high‐dimensional cytometric datasets , 2015, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[15]  Sean C. Bendall,et al.  Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with Prognosis , 2015, Cell.

[16]  Mauricio Barahona,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[17]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[18]  Hui Wang,et al.  SINCERA: A Pipeline for Single-Cell RNA-Seq Profiling Analysis , 2015, PLoS Comput. Biol..

[19]  Greg Finak,et al.  Optimizing transformations for automated, high throughput analysis of flow cytometry data , 2010, BMC Bioinformatics.

[20]  Wei Pan,et al.  Penalized Model-Based Clustering with Application to Variable Selection , 2007, J. Mach. Learn. Res..

[21]  N. Slavov,et al.  SCoPE-MS: mass spectrometry of single mammalian cells quantifies proteome heterogeneity during cell differentiation , 2017, Genome Biology.

[22]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[23]  Wei Sun,et al.  Regularized k-means clustering of high-dimensional data and its asymptotic consistency , 2012 .

[24]  Raphael Gottardo,et al.  flowClust: a Bioconductor package for automated gating of flow cytometry data , 2009, BMC Bioinformatics.

[25]  G. Nolan,et al.  Mass Cytometry: Single Cells, Many Features , 2016, Cell.

[26]  A. Regev,et al.  Scaling single-cell genomics from phenomenology to mechanism , 2017, Nature.

[27]  Y. Saeys,et al.  Computational flow cytometry: helping to make sense of high-dimensional immunology data , 2016, Nature Reviews Immunology.