Resolving single-cell heterogeneity from hundreds of thousands of cells through sequential hybrid clustering and NMF

The rapid proliferation of single-cell RNA-Sequencing (scRNA-Seq) technologies has spurred the development of diverse computational approaches to detect transcriptionally coherent populations. While the complexity of the algorithms for detecting heterogeneity have increased, most existing algorithms require significant user-tuning, are heavily reliant on dimensionality reduction techniques and are not scalable to ultra-large datasets. We previously described a multi-step algorithm, Iterative Clustering and Guide-gene selection (ICGS), which applies intra-gene correlation and hybrid clustering to uniquely resolve novel transcriptionally coherent cell populations from an intuitive graphical user interface. Here, we describe a new iteration of ICGS that outperforms state-of-the-art scRNA-Seq detection workflows when applied to well-established benchmarks. This approach combines multiple complementary subtype detection methods (HOPACH, sparse-NMF, cluster “fitness”, SVM) to resolve rare and common cell-states, while minimizing differences due to donor or batch effects. Using data from the Human Cell Atlas, we show that the PageRank algorithm effectively down samples ultra-large scRNA-Seq datasets, without losing extremely rare or transcriptionally similar distinct cell-types and while recovering novel transcriptionally unique cell populations. We believe this new approach holds tremendous promise in reproducibly resolving hidden cell populations in complex datasets. Highlights ICGS2 outperforms alternative approaches in small and ultra-large benchmark datasets Integrates multiple solutions for cell-type detection with supervised refinement Scales effectively to resolve rare cell-states from ultra-large datasets using PageRank sampling with a low memory footprint Integrated into AltAnalyze to enable sophisticated and automated downstream analysis

[1]  Alex A. Pollen,et al.  Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex , 2014, Nature Biotechnology.

[2]  S. Linnarsson,et al.  Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing , 2014, Nature Neuroscience.

[3]  Bonnie Berger,et al.  Efficient integration of heterogeneous single-cell transcriptomes using Scanorama , 2019, Nature Biotechnology.

[4]  Zhongyi Yan,et al.  A novel peptide targeting Clec9a on dendritic cell for cancer immunotherapy , 2016, Oncotarget.

[5]  Francisco Tirado Fernández,et al.  NMF-mGPU: non-negative matrix factorization on multi-GPU systems , 2015 .

[6]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[7]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[8]  Arndt Hartmann,et al.  CLEC10A Is a Specific Marker for Human CD1c+ Dendritic Cells and Enhances Their Toll-Like Receptor 7/8-Induced Cytokine Secretion , 2018, Front. Immunol..

[9]  Robert Cote,et al.  An innovative immunotherapeutic strategy for ovarian cancer: CLEC10A and glycomimetic peptides , 2018, Journal of Immunotherapy for Cancer.

[10]  Chris T. A. Evelo,et al.  Bioinformatics Applications Note Databases and Ontologies Go-elite: a Flexible Solution for Pathway and Ontology Over-representation , 2022 .

[11]  Mark J. van der Laan,et al.  A new algorithm for hybrid hierarchical clustering with visualization and the bootstrap , 2003 .

[12]  Kashish Chetal,et al.  The Human Cell Atlas bone marrow single-cell interactive web portal , 2018, Experimental hematology.

[13]  Martin Aumüller,et al.  ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms , 2018, SISAP.

[14]  Andrew J. Hill,et al.  The single cell transcriptional landscape of mammalian organogenesis , 2019, Nature.

[15]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[16]  Wei Cao,et al.  Plasmacytoid dendritic cell–specific receptor ILT7–FcɛRIγ inhibits Toll-like receptor–induced interferon production , 2006, The Journal of experimental medicine.

[17]  Evan Z. Macosko,et al.  Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity , 2019, Cell.

[18]  David E. Muench,et al.  Granulocyte‐Monocyte Progenitors and Monocyte‐Dendritic Cell Progenitors Independently Produce Functionally Distinct Monocytes , 2017, Immunity.

[19]  Hui Wang,et al.  SINCERA: A Pipeline for Single-Cell RNA-Seq Profiling Analysis , 2015, PLoS Comput. Biol..

[20]  N. Neff,et al.  Reconstructing lineage hierarchies of the distal lung epithelium using single cell RNA-seq , 2014, Nature.

[21]  David McDonald,et al.  Decoding human fetal liver haematopoiesis , 2019, Nature.

[22]  Chen Xu,et al.  Identification of cell types from single-cell transcriptomes using a novel clustering method , 2015, Bioinform..

[23]  Hannah A. Pliner,et al.  Reversed graph embedding resolves complex single-cell trajectories , 2017, Nature Methods.

[24]  N. Salomonis,et al.  Cross-platform single cell analysis of kidney development shows stromal cells express Gdnf. , 2017, Developmental biology.

[25]  M. Hemberg,et al.  Identifying cell populations with scRNASeq. , 2017, Molecular aspects of medicine.

[26]  James T. Webber,et al.  Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris , 2018, Nature.

[27]  Christos Boutsidis,et al.  SVD based initialization: A head start for nonnegative matrix factorization , 2008, Pattern Recognit..

[28]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[29]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[30]  Katia Perruccio,et al.  Toward the identification of a tolerogenic signature in IDO-competent dendritic cells. , 2006, Blood.

[31]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[32]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[33]  Kashish Chetal,et al.  Maturation of heart valve cell populations during postnatal remodeling , 2019, Development.

[34]  M. Gut,et al.  bigSCale: an analytical framework for big-scale single-cell data. , 2018, Genome research.

[35]  Bruce J. Aronow,et al.  Single-cell analysis of mixed-lineage states leading to a binary cell fate choice , 2016, Nature.

[36]  Francisco Tirado,et al.  NMF-mGPU: non-negative matrix factorization on multi-GPU systems , 2015, BMC Bioinformatics.

[37]  Fan Zhang,et al.  Fast, sensitive, and accurate integration of single cell data with Harmony , 2018, bioRxiv.

[38]  Ambrose J. Carr,et al.  Single-Cell Map of Diverse Immune Phenotypes in the Breast Tumor Microenvironment , 2018, Cell.

[39]  Dorothea Emig,et al.  AltAnalyze and DomainGraph: analyzing and visualizing exon expression data , 2010, Nucleic Acids Res..

[40]  Kashish Chetal,et al.  Defining human cardiac transcription factor hierarchies using integrated single-cell heterogeneity analysis , 2018, Nature Communications.

[41]  Marcel J. T. Reinders,et al.  A comparison of automatic cell identification methods for single-cell RNA sequencing data , 2019, Genome Biology.

[42]  Fabian J Theis,et al.  SCANPY: large-scale single-cell gene expression data analysis , 2018, Genome Biology.

[43]  Michael I. Jordan,et al.  Deep Generative Modeling for Single-cell Transcriptomics , 2018, Nature Methods.

[44]  Bruce Aronow,et al.  The Molecular Signature of Megakaryocyte-Erythroid Progenitors Reveals a Role for the Cell Cycle in Fate Specification , 2018, Cell reports.

[45]  Daniel Schnell,et al.  cellHarmony: cell-level matching and holistic comparison of single-cell transcriptomes , 2019, Nucleic acids research.

[46]  M. Cugmas,et al.  On comparing partitions , 2015 .

[47]  Shila Ghazanfar,et al.  scMerge leverages factor analysis, stable expression, and pseudoreplication to merge multiple single-cell RNA-seq datasets , 2019, Proceedings of the National Academy of Sciences.

[48]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[49]  Wei Cao,et al.  Plasmacytoid dendritic cell-specific receptor ILT7-Fc epsilonRI gamma inhibits Toll-like receptor-induced interferon production , 2006 .

[50]  Hyunsoo Kim,et al.  Sparse Non-negative Matrix Factorizations via Alternating Non-negativity-constrained Least Squares , 2006 .