Identification of cell types, states and programs by learning gene set representations

As single cell molecular data expand, there is an increasing need for algorithms that efficiently query and prioritize gene programs, cell types and states in single-cell sequencing data, particularly in cell atlases. Here we present scDECAF, a statistical learning algorithm to identify cell types, states and programs in single-cell gene expression data using vector representation of gene sets, which improves biological interpretation by selecting a subset of most biologically relevant programs. We applied scDECAF to scRNAseq data from PBMC, Lung, Pancreas, Brain and slide-tags snRNA of human prefrontal cortex for automatic cell type annotation. We demonstrate that scDECAF can recover perturbed gene programs in Lupus PBMC cells stimulated with IFNbeta and TGFBeta-induced cells undergoing epithelial-to-mesenchymal transition. scDECAF delineates patient-specific heterogeneity in cellular programs in Ovarian Cancer data. Using a healthy PBMC reference, we apply scDECAF to a mapped query PBMC COVID-19 case-control dataset and identify multicellular programs associated with severe COVID-19. scDECAF can improve biological interpretation and complement reference mapping analysis, and provides a method for gene set and pathway analysis in single cell gene expression data.

[1]  A. Regev,et al.  Scalable querying of human cell atlases via a foundational model reveals commonalities across fibrosis-associated macrophages , 2023, bioRxiv.

[2]  Malte D. Luecken,et al.  An integrated cell atlas of the lung in health and disease , 2023, Nature Medicine.

[3]  L. Jardine,et al.  Automatic cell-type harmonization and integration across Human Cell Atlas datasets , 2023, Cell.

[4]  Evan Z. Macosko,et al.  Slide-tags: scalable, single-nucleus barcoding for multi-modal spatial genomics , 2023, bioRxiv.

[5]  Fabian J Theis,et al.  Best practices for single-cell analysis across modalities , 2023, Nature Reviews Genetics.

[6]  T. Nawy,et al.  Supervised discovery of interpretable gene programs from single-cell data , 2022, bioRxiv.

[7]  A. Regev,et al.  Impact of the Human Cell Atlas on medicine , 2022, Nature Medicine.

[8]  S. Teichmann,et al.  Precise identification of cell states altered in disease with healthy single-cell references , 2022, bioRxiv.

[9]  C. Martínez-A,et al.  Mitochondrial reactive oxygen is critical for IL-12/IL-18-induced IFN-γ production by CD4+ T cells and is regulated by Fas/FasL signaling , 2022, Cell Death & Disease.

[10]  J. Park,et al.  Cross-tissue immune cell analysis reveals tissue-specific features in humans , 2022, Science.

[11]  A. Regev,et al.  DIALOGUE maps multicellular programs in tissue from single-cell or spatial transcriptomics data , 2022, Nature Biotechnology.

[12]  Fabian J Theis,et al.  Biologically informed deep learning to infer gene program activity in single cells , 2022, bioRxiv.

[13]  N. Heaton,et al.  The Impact of Estrogens and Their Receptors on Immunity and Inflammation during Infection , 2022, Cancers.

[14]  Michael I. Jordan,et al.  A Python library for probabilistic analysis of single-cell omics data , 2022, Nature Biotechnology.

[15]  S. Teichmann,et al.  Differential abundance testing on single-cell data using k-nearest neighbor graphs , 2021, Nature Biotechnology.

[16]  Joshua M. Stuart,et al.  VEGA is an interpretable generative model for inferring biological network activity in single-cell transcriptomics , 2021, Nature Communications.

[17]  Fabian J Theis,et al.  Mapping single-cell data to reference atlases by transfer learning , 2021, Nature Biotechnology.

[18]  Yongjin P. Park Faculty Opinions recommendation of SCENIC: single-cell regulatory network inference and clustering. , 2021, Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature.

[19]  Y. Kuo,et al.  Pancreas-Brain Crosstalk , 2021, Frontiers in Neuroanatomy.

[20]  L. Martignetti,et al.  Gene signature extraction and cell identity recognition at the single-cell level with Cell-ID , 2021, Nature Biotechnology.

[21]  Frances E. Muldoon,et al.  Single-cell multi-omics analysis of the immune response in COVID-19 , 2021, Nature Medicine.

[22]  Santiago J. Carmona,et al.  UCell: Robust and scalable single-cell gene signature scoring , 2021, bioRxiv.

[23]  A. Satterthwaite,et al.  Recent Advances in Lupus B Cell Biology: PI3K, IFNγ, and Chromatin , 2021, Frontiers in Immunology.

[24]  Aaron M. Streets,et al.  Joint probabilistic modeling of single-cell multi-omic data with totalVI , 2021, Nature Methods.

[25]  Mark M. Davis,et al.  Multi-Omics Resolves a Sharp Disease-State Shift between Mild and Moderate COVID-19 , 2020, Cell.

[26]  Madeleine K. D. Scott,et al.  Systems biological assessment of immunity to mild versus severe COVID-19 infection in humans , 2020, Science.

[27]  Konrad U. Förstner,et al.  Disease severity-specific neutrophil signatures in blood transcriptomes stratify COVID-19 patients , 2020, Genome Medicine.

[28]  A. Regev,et al.  A single-cell landscape of high-grade serous ovarian cancer , 2020, Nature Medicine.

[29]  B. Vanderhyden,et al.  Context specificity of the EMT transcriptional response , 2020, Nature Communications.

[30]  Gennady Korotkevich,et al.  Fast gene set enrichment analysis , 2019, bioRxiv.

[31]  Kieran R. Campbell,et al.  Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling , 2019, Nature Methods.

[32]  Irving L. Weissman,et al.  A molecular cell atlas of the human lung from single cell RNA sequencing , 2019, Nature.

[33]  Ralf Zimmer,et al.  Toward a gold standard for benchmarking gene set enrichment analysis , 2019, bioRxiv.

[34]  Oscar Franzén,et al.  PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data , 2019, Database J. Biol. Databases Curation.

[35]  R. Satija,et al.  Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression , 2019, Genome Biology.

[36]  Philip Lijnzaad,et al.  CHETAH: a selective, hierarchical cell type identification method for single-cell RNA sequencing , 2019, bioRxiv.

[37]  Cole Trapnell,et al.  Supervised classification enables rapid annotation of cell atlases , 2019, Nature Methods.

[38]  A. Butte,et al.  Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage , 2018, Nature Immunology.

[39]  Christoph Hafemeister,et al.  Comprehensive integration of single cell data , 2018, bioRxiv.

[40]  Michael I. Jordan,et al.  Deep Generative Modeling for Single-cell Transcriptomics , 2018, Nature Methods.

[41]  Feng Li,et al.  CellMarker: a manually curated resource of cell markers in human and mouse , 2018, Nucleic Acids Res..

[42]  James T. Webber,et al.  Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris , 2018, Nature.

[43]  Nir Yosef,et al.  Functional interpretation of single cell similarity maps , 2018, Nature Communications.

[44]  M. Hemberg,et al.  scmap: projection of single-cell RNA-seq data across data sets , 2018, Nature Methods.

[45]  Fabian J Theis,et al.  SCANPY: large-scale single-cell gene expression data analysis , 2018, Genome Biology.

[46]  S. Linnarsson,et al.  Conserved properties of dentate gyrus neurogenesis across postnatal development revealed by single-cell RNA sequencing , 2018, Nature Neuroscience.

[47]  Chun Jimmie Ye,et al.  Multiplexed droplet single-cell RNA-sequencing using natural genetic variation , 2017, Nature Biotechnology.

[48]  J. Aerts,et al.  SCENIC: Single-cell regulatory network inference and clustering , 2017, Nature Methods.

[49]  Aaron T. L. Lun,et al.  Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R , 2017, Bioinform..

[50]  Samuel L. Wolock,et al.  A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure. , 2016, Cell systems.

[51]  D. M. Smith,et al.  Single-Cell Transcriptome Profiling of Human Pancreatic Islets in Health and Type 2 Diabetes , 2016, Cell metabolism.

[52]  John C Marioni,et al.  A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor , 2016, F1000Research.

[53]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[54]  J. Mesirov,et al.  The Molecular Signatures Database Hallmark Gene Set Collection , 2015 .

[55]  Joseph L. Herman,et al.  Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis , 2015, Nature Methods.

[56]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[57]  A. Reder,et al.  How type I interferons work in multiple sclerosis and other diseases: some unexpected mechanisms. , 2014, Journal of interferon & cytokine research : the official journal of the International Society for Interferon and Cytokine Research.

[58]  Zhengqi Wang,et al.  STAT5 in hematopoietic stem cell biology and transplantation. , 2013, JAK-STAT.

[59]  Cristina Mitrea,et al.  Methods and approaches in the topology-based analysis of biological pathways , 2013, Front. Physiol..

[60]  Helga Thorvaldsdóttir,et al.  Molecular signatures database (MSigDB) 3.0 , 2011, Bioinform..

[61]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[62]  Davis J. McCarthy,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[63]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[64]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[65]  D. Kono,et al.  The role of IFN-gamma in systemic lupus erythematosus: a challenge to the Th1/Th2 paradigm in autoimmunity , 2001, Arthritis research.

[66]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[67]  Interferon- (cid:1) promotes abnormal vasculogenesis in lupus: a potential pathway for premature atherosclerosis , 2022 .