Resolving single-cell heterogeneity from hundreds of thousands of cells through sequential hybrid clustering and NMF

Abstract Motivation The rapid proliferation of single-cell RNA-sequencing (scRNA-Seq) technologies has spurred the development of diverse computational approaches to detect transcriptionally coherent populations. While the complexity of the algorithms for detecting heterogeneity has increased, most require significant user-tuning, are heavily reliant on dimension reduction techniques and are not scalable to ultra-large datasets. We previously described a multi-step algorithm, Iterative Clustering and Guide-gene Selection (ICGS), which applies intra-gene correlation and hybrid clustering to uniquely resolve novel transcriptionally coherent cell populations from an intuitive graphical user interface. Results We describe a new iteration of ICGS that outperforms state-of-the-art scRNA-Seq detection workflows when applied to well-established benchmarks. This approach combines multiple complementary subtype detection methods (HOPACH, sparse non-negative matrix factorization, cluster ‘fitness’, support vector machine) to resolve rare and common cell-states, while minimizing differences due to donor or batch effects. Using data from multiple cell atlases, we show that the PageRank algorithm effectively downsamples ultra-large scRNA-Seq datasets, without losing extremely rare or transcriptionally similar yet distinct cell types and while recovering novel transcriptionally distinct cell populations. We believe this new approach holds tremendous promise in reproducibly resolving hidden cell populations in complex datasets. Availability and implementation ICGS2 is implemented in Python. The source code and documentation are available at http://altanalyze.org. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Kashish Chetal,et al.  The Human Cell Atlas bone marrow single-cell interactive web portal , 2018, Experimental hematology.

[2]  Kamil Slowikowski,et al.  Fast, sensitive, and accurate integration of single cell data with Harmony , 2019, Nature Methods.

[3]  M. Hemberg,et al.  Identifying cell populations with scRNASeq. , 2017, Molecular aspects of medicine.

[4]  N. Salomonis,et al.  Cross-platform single cell analysis of kidney development shows stromal cells express Gdnf. , 2017, Developmental biology.

[5]  James T. Webber,et al.  Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris , 2018, Nature.

[6]  Christos Boutsidis,et al.  SVD based initialization: A head start for nonnegative matrix factorization , 2008, Pattern Recognit..

[7]  Bonnie Berger,et al.  Efficient integration of heterogeneous single-cell transcriptomes using Scanorama , 2019, Nature Biotechnology.

[8]  M. Reinders,et al.  A comparison of automatic cell identification methods for single-cell RNA sequencing data , 2019, Genome Biology.

[9]  Zhongyi Yan,et al.  A novel peptide targeting Clec9a on dendritic cell for cancer immunotherapy , 2016, Oncotarget.

[10]  Chen Xu,et al.  Identification of cell types from single-cell transcriptomes using a novel clustering method , 2015, Bioinform..

[11]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[12]  Kashish Chetal,et al.  Maturation of heart valve cell populations during postnatal remodeling , 2019, Development.

[13]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[14]  Alex A. Pollen,et al.  Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex , 2014, Nature Biotechnology.

[15]  S. Linnarsson,et al.  Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing , 2014, Nature Neuroscience.

[16]  Hui Wang,et al.  SINCERA: A Pipeline for Single-Cell RNA-Seq Profiling Analysis , 2015, PLoS Comput. Biol..

[17]  Ambrose J. Carr,et al.  Single-Cell Map of Diverse Immune Phenotypes in the Breast Tumor Microenvironment , 2018, Cell.

[18]  Dorothea Emig,et al.  AltAnalyze and DomainGraph: analyzing and visualizing exon expression data , 2010, Nucleic Acids Res..

[19]  Martin Aumüller,et al.  ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms , 2018, SISAP.

[20]  Kashish Chetal,et al.  Defining human cardiac transcription factor hierarchies using integrated single-cell heterogeneity analysis , 2018, Nature Communications.

[21]  Bruce J. Aronow,et al.  Single-cell analysis of mixed-lineage states leading to a binary cell fate choice , 2016, Nature.

[22]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[23]  Francisco Tirado,et al.  NMF-mGPU: non-negative matrix factorization on multi-GPU systems , 2015, BMC Bioinformatics.

[24]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[25]  Hyunsoo Kim,et al.  Sparse Non-negative Matrix Factorizations via Alternating Non-negativity-constrained Least Squares , 2006 .

[26]  Katia Perruccio,et al.  Toward the identification of a tolerogenic signature in IDO-competent dendritic cells. , 2006, Blood.

[27]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[28]  Bruce Aronow,et al.  The Molecular Signature of Megakaryocyte-Erythroid Progenitors Reveals a Role for the Cell Cycle in Fate Specification , 2018, Cell reports.

[29]  Daniel Schnell,et al.  cellHarmony: cell-level matching and holistic comparison of single-cell transcriptomes , 2019, Nucleic acids research.

[30]  N. Neff,et al.  Reconstructing lineage hierarchies of the distal lung epithelium using single cell RNA-seq , 2014, Nature.

[31]  David McDonald,et al.  Decoding human fetal liver haematopoiesis , 2019, Nature.

[32]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[33]  Arndt Hartmann,et al.  CLEC10A Is a Specific Marker for Human CD1c+ Dendritic Cells and Enhances Their Toll-Like Receptor 7/8-Induced Cytokine Secretion , 2018, Front. Immunol..

[34]  Robert Cote,et al.  An innovative immunotherapeutic strategy for ovarian cancer: CLEC10A and glycomimetic peptides , 2018, Journal of Immunotherapy for Cancer.

[35]  Chris T. A. Evelo,et al.  Bioinformatics Applications Note Databases and Ontologies Go-elite: a Flexible Solution for Pathway and Ontology Over-representation , 2022 .

[36]  Mark J. van der Laan,et al.  A new algorithm for hybrid hierarchical clustering with visualization and the bootstrap , 2003 .

[37]  Shila Ghazanfar,et al.  scMerge leverages factor analysis, stable expression, and pseudoreplication to merge multiple single-cell RNA-seq datasets , 2019, Proceedings of the National Academy of Sciences.

[38]  Fabian J Theis,et al.  SCANPY: large-scale single-cell gene expression data analysis , 2018, Genome Biology.

[39]  Michael I. Jordan,et al.  Deep Generative Modeling for Single-cell Transcriptomics , 2018, Nature Methods.

[40]  Andrew J. Hill,et al.  The single cell transcriptional landscape of mammalian organogenesis , 2019, Nature.

[41]  Wei Cao,et al.  Plasmacytoid dendritic cell–specific receptor ILT7–FcɛRIγ inhibits Toll-like receptor–induced interferon production , 2006, The Journal of experimental medicine.

[42]  Evan Z. Macosko,et al.  Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity , 2019, Cell.

[43]  David E. Muench,et al.  Granulocyte‐Monocyte Progenitors and Monocyte‐Dendritic Cell Progenitors Independently Produce Functionally Distinct Monocytes , 2017, Immunity.

[44]  Principal Investigators,et al.  Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris , 2018 .

[45]  M. Gut,et al.  bigSCale: an analytical framework for big-scale single-cell data. , 2018, Genome research.