Detection of differentially abundant cell subpopulations discriminates biological states in scRNA-seq data

Abstract Recently, single cell RNA sequencing (scRNA-Seq) has been applied to compare transcriptomic landscapes between biological conditions or states. This comparison allows the discovery of cell subpopulations that are differentially abundant between biological conditions or states. Typically, these subpopulations are determined by identification of clusters or cell types in which the proportion of cells from the two states deviates significantly from the corresponding proportion in the entire cell populations. This approach might be suboptimal as the most differentially abundant subpopulations may not perfectly overlap with these clusters or cell types. Here, we develop DA-seq, a multiscale algorithm that detects cell subpopulations with the highest disproportionate cell abundances between two samples. DA-seq does not require prior partitioning of the data based on clustering or cell subtypes. To evaluate DA-seq, we applied it to one simulated and two real scRNA-Seq datasets. The first scRNA-Seq dataset was obtained from a study on melanoma responders and non-responders to checkpoint immunotherapy. The second datasets is from a study that compares two time points of developing embryonic mouse skin.

[1]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[2]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[3]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[4]  A. Skubitz,et al.  CD63 associates with CD11/CD18 in large detergent‐resistant complexes after translocation to the cell surface in human neutrophils 1 , 2000, FEBS letters.

[5]  R. Flavell,et al.  Conditional Vascular Cell Adhesion Molecule 1 Deletion in Mice , 2001, The Journal of experimental medicine.

[6]  S. Akira,et al.  IL-6 induces an anti-inflammatory response in the absence of SOCS3 in macrophages , 2003, Nature Immunology.

[7]  Donald Metcalf,et al.  SOCS3 negatively regulates IL-6 signaling in vivo , 2003, Nature Immunology.

[8]  R. Khanna,et al.  Expression of LAG-3 by tumor-infiltrating lymphocytes is coincident with the suppression of latent membrane antigen-specific CD8+ T-cell function in Hodgkin lymphoma patients. , 2006, Blood.

[9]  L. Ortiz,et al.  Interleukin 1 receptor antagonist mediates the antiinflammatory and antifibrotic effect of mesenchymal stem cells during lung injury , 2007, Proceedings of the National Academy of Sciences.

[10]  E. Jaffee,et al.  Ectopic expression of vascular cell adhesion molecule-1 as a new mechanism for tumor immune evasion. , 2007, Cancer research.

[11]  T. Wu The role of vascular cell adhesion molecule-1 in tumor immune evasion. , 2007, Cancer research.

[12]  Larry A. Wasserman,et al.  Statistical Analysis of Semi-Supervised Regression , 2007, NIPS.

[13]  Ann B. Lee,et al.  EXPLOITING LOW-DIMENSIONAL STRUCTURE IN ASTRONOMICAL SPECTRA , 2008, 0807.2900.

[14]  Yasin Abbasi-Yadkori,et al.  Fast Approximate Nearest-Neighbor Search with k-Nearest Neighbor Graph , 2011, IJCAI.

[15]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[16]  Luis Angel García-Escudero,et al.  tclust: An R Package for a Trimming Approach to Cluster Analysis , 2012 .

[17]  S. Gordon,et al.  Genetic programs expressed in resting and IL-4 alternatively activated mouse and human macrophages: similarities and differences. , 2013, Blood.

[18]  Alix Lheritier,et al.  Beyond two-sample-tests: Localizing data discrepancies in high-dimensional spaces , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[19]  G. Bendas,et al.  Vascular cell adhesion molecule‐1 (VCAM‐1)—An increasing insight into its role in tumorigenicity and metastasis , 2015, International journal of cancer.

[20]  Sashank J. Reddi,et al.  On the Decreasing Power of Kernel and Distance Based Nonparametric Hypothesis Tests in High Dimensions , 2014, AAAI.

[21]  Clare L. Bennett,et al.  OX40- and CD27-Mediated Costimulation Synergizes with Anti–PD-L1 Blockade by Forcing Exhausted CD8+ T Cells To Exit Quiescence , 2015, The Journal of Immunology.

[22]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[23]  E. Arias-Castro,et al.  Remember the curse of dimensionality: the case of goodness-of-fit testing in arbitrary dimension , 2016, 1607.08156.

[24]  A. Sharpe,et al.  Roles of CD48 in regulating immunity and tolerance. , 2016, Clinical immunology.

[25]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[26]  Thomas Höfer,et al.  Robust classification of single-cell transcriptome data by nonnegative matrix factorization , 2017, Bioinform..

[27]  A. B. Lee,et al.  Local two-sample testing: a new tool for analysing high-dimensional astronomical data , 2017, 1707.04592.

[28]  Jun Zhao,et al.  Removal of batch effects using distribution‐matching residual networks , 2016, Bioinform..

[29]  S. Weissman,et al.  Detecting heterogeneity in single-cell RNA-Seq data by non-negative matrix factorization , 2017, PeerJ.

[30]  John C. Marioni,et al.  Testing for differential abundance in mass cytometry data , 2017, Nature Methods.

[31]  Shuguang Duo,et al.  Nap1l1 Controls Embryonic Neural Progenitor Cell Proliferation and Differentiation in the Developing Brain. , 2018, Cell reports.

[32]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[33]  Laleh Haghverdi,et al.  Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors , 2018, Nature Biotechnology.

[34]  Paul J. Hoover,et al.  Defining T Cell States Associated with Response to Checkpoint Immunotherapy in Melanoma , 2018, Cell.

[35]  Nancy R. Zhang,et al.  SAVER: Gene expression recovery for single-cell RNA sequencing , 2018, Nature Methods.

[36]  M. R. Kim,et al.  Emerging Roles of Vascular Cell Adhesion Molecule-1 (VCAM-1) in Immunological Disorders and Cancer , 2018, International journal of molecular sciences.

[37]  Xin Gao,et al.  ClusterMap: Comparing analyses across multiple Single Cell RNA-Seq profiles , 2018, bioRxiv.

[38]  Y. Kluger,et al.  Zero-preserving imputation of scRNA-seq data using low-rank approximation , 2018, bioRxiv.

[39]  Quin F. Wills,et al.  Structural Remodeling of the Human Colonic Mesenchyme in Inflammatory Bowel Disease , 2018, Cell.

[40]  R. Yi,et al.  Single Cell and Open Chromatin Analysis Reveals Molecular Origin of Epidermal Cells of the Skin. , 2018, Developmental cell.

[41]  A. Shilatifard,et al.  Single-Cell Transcriptomic Analysis of Human Lung Provides Insights into the Pathobiology of Pulmonary Fibrosis , 2019, American journal of respiratory and critical care medicine.

[42]  Xin Gao,et al.  ClusterMap: compare multiple single cell RNA-Seq datasets across different experimental conditions , 2019, Bioinform..

[43]  Y. Kluger,et al.  Single-Cell Analysis Reveals a Hair Follicle Dermal Niche Molecular Differentiation Trajectory that Begins Prior to Morphogenesis. , 2019, Developmental cell.

[44]  Paul J. Hoffman,et al.  Comprehensive Integration of Single-Cell Data , 2018, Cell.

[45]  R. Satija,et al.  The bone marrow microenvironment at single-cell resolution , 2019, Nature.

[46]  S. Johnston,et al.  Locally instructed CXCR4hi neutrophils trigger environment-driven allergic asthma through the release of neutrophil extracellular traps , 2019, Nature Immunology.

[47]  Ann B. Lee,et al.  Global and local two-sample tests via regression , 2018, Electronic Journal of Statistics.

[48]  Gary D. Bader,et al.  Single-cell transcriptomic profiling of the aging mouse brain , 2019, Nature Neuroscience.

[49]  S. Fagerholm,et al.  Cell Adhesion Molecules and Their Roles and Regulation in the Immune and Tumor Microenvironment , 2019, Front. Immunol..

[50]  John T. Ormerod,et al.  scDC: single cell differential composition analysis , 2019, BMC Bioinformatics.

[51]  Stefan Steinerberger,et al.  Fast Interpolation-based t-SNE for Improved Visualization of Single-Cell RNA-Seq Data , 2017, Nature Methods.

[52]  Fabian J. Theis,et al.  12 Grand Challenges in Single-Cell Data Science , 2019, PeerJ Prepr..

[53]  David van Dijk,et al.  Enhancing experimental signals in single-cell RNA-sequencing data using graph signal processing , 2019 .

[54]  Itai Yanai,et al.  Accurate denoising of single-cell RNA-Seq data using unbiased principal component analysis , 2019, bioRxiv.

[55]  Florian Wagner,et al.  ENHANCE: Accurate denoising of single-cell RNA-Seq data , 2019 .

[56]  Marcel J. T. Reinders,et al.  A comparison of automatic cell identification methods for single-cell RNA sequencing data , 2019, Genome Biology.

[57]  Xiaoshu Zhu,et al.  A Hybrid Clustering Algorithm for Identifying Cell Types from Single-Cell RNA-Seq Data , 2019, Genes.

[58]  Zhongming Zhao,et al.  Multi-Objective Optimized Fuzzy Clustering for Detecting Cell Clusters from Single-Cell Expression Profiles , 2019, Genes.

[59]  Ronen Basri,et al.  The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies , 2019, NeurIPS.

[60]  Fabian J Theis,et al.  Single-cell RNA-seq denoising using a deep count autoencoder , 2019, Nature Communications.

[61]  Roland Eils,et al.  COVID-19 severity correlates with airway epithelium–immune cell interactions identified by single-cell analysis , 2020, Nature Biotechnology.

[62]  Samet Oymak,et al.  Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks , 2019, AISTATS.

[63]  G. FitzGerald Misguided drug advice for COVID-19 , 2020, Science.

[64]  Lin Cheng,et al.  Single-cell landscape of bronchoalveolar immune cells in patients with COVID-19 , 2020, Nature Medicine.

[65]  Ronen Basri,et al.  Frequency Bias in Neural Networks for Input of Non-Uniform Density , 2020, ICML.

[66]  Stefan Steinerberger,et al.  Randomized Near Neighbor Graphs, Giant Components, and Applications in Data Science , 2017, ArXiv.

[67]  Ofir Lindenbaum,et al.  Deep supervised feature selection using Stochastic Gates , 2018, ICML.

[68]  Ariel J. Levine,et al.  Cell type prioritization in single-cell data , 2020, Nature Biotechnology.

[69]  Danna Zhou,et al.  d. , 1840, Microbial pathogenesis.