On the discovery of subpopulation-specific state transitions from multi-sample multi-condition single-cell RNA sequencing data

Single-cell RNA sequencing (scRNA-seq) has quickly become an empowering technology to profile the transcriptomes of individual cells on a large scale. Many early analyses of differential expression have aimed at identifying differences between subpopulations, and thus are focused on finding subpopulation markers either in a single sample or across multiple samples. More generally, such methods can compare expression levels in multiple sets of cells, thus leading to cross-condition analyses. However, given the emergence of replicated multi-condition scRNA-seq datasets, an area of increasing focus is making sample-level inferences, termed here as differential state analysis. For example, one could investigate the condition-specific responses of cell subpopulations measured from patients from each condition; however, it is not clear which statistical framework best handles this situation. In this work, we surveyed the methods available to perform cross-condition differential state analyses, including cell-level mixed models and methods based on aggregated “pseudobulk” data. We developed a flexible simulation platform that mimics both single and multi-sample scRNA-seq data and provide robust tools for multi-condition analysis within the muscat R package.

[1]  M. Stephens,et al.  K-Sample Anderson–Darling Tests , 1987 .

[2]  F. Chiappelli,et al.  The glossopharyngeal nerve as a novel pathway in immune-to-brain communication: relevance to neuroimmune surveillance of the oral cavity , 2001, Journal of Neuroimmunology.

[3]  W. Banks,et al.  Effect of LPS on the permeability of the blood–brain barrier to insulin , 2001, Brain Research.

[4]  J. UlmerA,et al.  リポ多糖 : 構造, 生物活性, 受容体, およびシグナル伝達 , 2002 .

[5]  Gordon K. Smyth,et al.  Use of within-array replicate spots for assessing differential expression in microarray experiments , 2005, Bioinform..

[6]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[7]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[8]  R. Pfeiffer Lipopolysaccharide : Structure , Bioactivity , Receptors , and Signal Transduction , 2009 .

[9]  Pedro M. Valero-Mora,et al.  ggplot2: Elegant Graphics for Data Analysis , 2010 .

[10]  S. M. Robinson,et al.  Minimal penetration of lipopolysaccharide across the murine blood–brain barrier , 2010, Brain, Behavior, and Immunity.

[11]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[12]  Sven Rahmann,et al.  Snakemake--a scalable bioinformatics workflow engine. , 2012, Bioinformatics.

[13]  Ludo Waltman,et al.  A smart local moving algorithm for large-scale modularity-based community detection , 2013, The European Physical Journal B.

[14]  Charity W. Law,et al.  voom: precision weights unlock linear model analysis tools for RNA-seq read counts , 2014, Genome Biology.

[15]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[16]  R. Tibshirani,et al.  Automated identification of stratifying signatures in cellular subpopulations , 2014, Proceedings of the National Academy of Sciences.

[17]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[18]  P. Linsley,et al.  MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data , 2015, Genome Biology.

[19]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[20]  Raphael Gottardo,et al.  Orchestrating high-throughput genomic analysis with Bioconductor , 2015, Nature Methods.

[21]  S. Teichmann,et al.  Computational and analytical challenges in single-cell transcriptomics , 2015, Nature Reviews Genetics.

[22]  Cole Trapnell,et al.  Defining cell types and states with single-cell genomics , 2015, Genome research.

[23]  Charlotte Soneson,et al.  iCOBRA: open, reproducible, standardized and live method benchmarking , 2015, Nature Methods.

[24]  Roland Eils,et al.  Complex heatmaps reveal patterns and correlations in multidimensional genomic data , 2016, Bioinform..

[25]  Eric E. Schadt,et al.  variancePartition: interpreting drivers of variation in complex gene expression studies , 2016, BMC Bioinformatics.

[26]  Aaron T. L. Lun,et al.  It's DE-licious: A Recipe for Differential Expression Analyses of RNA-seq Experiments Using Quasi-Likelihood Methods in edgeR , 2016, Statistical Genomics.

[27]  A. Regev,et al.  Revealing the vectors of cellular identity with single-cell genomics , 2016, Nature Biotechnology.

[28]  Davis J. McCarthy,et al.  A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor , 2016, F1000Research.

[29]  Keegan D. Korthauer,et al.  A statistical approach for identifying differential distributions in single-cell RNA-seq experiments , 2016, Genome Biology.

[30]  Alexander Lex,et al.  UpSetR: an R package for the visualization of intersecting sets and their properties , 2017, bioRxiv.

[31]  Maria K. Jaakkola,et al.  Comparison of methods to detect differentially expressed genes between single-cell populations , 2016, Briefings Bioinform..

[32]  Eirini Arvaniti,et al.  Sensitive detection of rare disease-associated cell subsets via representation learning , 2016, Nature Communications.

[33]  J. Marioni,et al.  Overcoming confounding plate effects in differential expression analyses of single-cell RNA-seq data , 2016, bioRxiv.

[34]  David A. Knowles,et al.  Batch effects and the effective design of single-cell gene expression studies , 2016, Scientific Reports.

[35]  B. Becher,et al.  CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets , 2017, F1000Research.

[36]  Aaron T. L. Lun,et al.  Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R , 2017, Bioinform..

[37]  Malgorzata Nowicka,et al.  CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets. , 2019, F1000Research.

[38]  Kevin Rue-Albrecht,et al.  iSEE: Interactive SummarizedExperiment Explorer , 2018, F1000Research.

[39]  John R. Haliburton,et al.  Dissecting heterogeneous cell-populations across signaling and disease conditions with PopAlign , 2018, bioRxiv.

[40]  M. Robinson,et al.  A systematic performance evaluation of clustering methods for single-cell RNA-seq data. , 2018, F1000Research.

[41]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[42]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[43]  Mark D. Robinson,et al.  diffcyt: Differential discovery in high-dimensional cytometry via high-resolution clustering , 2018, Communications Biology.

[44]  Mark D. Robinson,et al.  Towards unified quality verification of synthetic count data with countsimQC , 2017, Bioinform..

[45]  Luyi Tian,et al.  Comparison of clustering tools in R for medium-sized 10x Genomics single-cell RNA-sequencing data , 2018, F1000Research.

[46]  Luyi Tian,et al.  Comparison of clustering tools in R for medium-sized 10x Genomics single-cell RNA-sequencing data , 2018, F1000Research.

[47]  Chun Jimmie Ye,et al.  Multiplexed droplet single-cell RNA-sequencing using natural genetic variation , 2017, Nature Biotechnology.

[48]  Kamil Slowikowski,et al.  Mixed-effects association of single cells identifies an expanded effector CD4+ T cell subset in rheumatoid arthritis , 2018, Science Translational Medicine.

[49]  Mark D. Robinson,et al.  Author Correction: High-dimensional single-cell analysis predicts response to anti-PD-1 immunotherapy , 2018, Nature Medicine.

[50]  Charlotte Soneson,et al.  Bias, robustness and scalability in single-cell differential expression analysis , 2018, Nature Methods.

[51]  S. Raychaudhuri,et al.  Mixed Effects Association of Single Cells Identifies an Expanded Th1-Skewed Cytotoxic Effector CD4+ T Cell Subset in Rheumatoid Arthritis , 2018 .

[52]  David Watson,et al.  M3C: A Monte Carlo reference-based consensus clustering algorithm , 2018, bioRxiv.

[53]  Mark D. Robinson,et al.  Compensation of Signal Spillover in Suspension and Imaging Mass Cytometry , 2018, Cell systems.

[54]  diffcyt: Differential discovery in high-dimensional cytometry via high-resolution clustering. , 2019 .

[55]  Paul J. Hoffman,et al.  Comprehensive Integration of Single-Cell Data , 2018, Cell.

[56]  Dennis Kostka,et al.  scds: Computational Annotation of Doublets in Single Cell RNA Sequencing Data , 2019, bioRxiv.

[57]  Rafael A. Irizarry,et al.  Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model , 2019, Genome Biology.

[58]  Boyang Li,et al.  Comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data , 2019, BMC Bioinformatics.

[59]  R. Satija,et al.  Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression , 2019, Genome Biology.

[60]  Christina Kendziorski,et al.  A Compositional Model to Assess Expression Changes from Single-Cell Rna-Seq Data , 2019, bioRxiv.

[61]  Gary D Bader,et al.  Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data , 2019, bioRxiv.

[62]  Susan Holmes,et al.  Uncertainty Quantification in Multivariate Mixed Models for Mass Cytometry Data , 2019, 1903.07976.

[63]  Åsa K. Björklund,et al.  Single-cell RNA sequencing reveals midbrain dopamine neuron diversity emerging during mouse brain development , 2019, Nature Communications.

[64]  Mark D. Robinson,et al.  diffcyt: Differential discovery in high-dimensional cytometry via high-resolution clustering , 2019, Communications Biology.

[65]  Matthew Stephens,et al.  Creating and sharing reproducible research code the workflowr way , 2019, F1000Research.

[66]  Valentine Svensson,et al.  Droplet scRNA-seq is not zero-inflated , 2019, Nature Biotechnology.

[67]  J. Taube,et al.  A new data-driven cell population discovery and annotation method for single-cell data, FAUST, reveals correlates of clinical response to cancer immunotherapy , 2019, bioRxiv.

[68]  I. Yanai,et al.  A periodic table of cell types , 2019, Development.

[69]  Pardis C Sabeti,et al.  Identifying gene expression programs of cell-type identity and cellular activity with single-cell RNA-Seq , 2018, bioRxiv.

[70]  R. Irizarry ggplot2 , 2019, Introduction to Data Science.

[71]  Fabian J Theis,et al.  Current best practices in single‐cell RNA‐seq analysis: a tutorial , 2019, Molecular systems biology.

[72]  Samantha A. Morris The evolving concept of cell identity in the single cell era , 2019, Development.

[73]  Kieran R. Campbell,et al.  Probabilistic cell type assignment of single-cell transcriptomic data reveals spatiotemporal microenvironment dynamics in human cancers , 2019, bioRxiv.

[74]  Greg Finak,et al.  New interpretable machine learning method for 1 single-cell data reveals correlates of clinical response to 2 cancer immunotherapy , 2019 .

[75]  Gary D Bader,et al.  Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data , 2019, F1000Research.

[76]  Gary D. Bader,et al.  Evaluation of methods to assign cell type labels to cell clusters from single-cell RNAsequencing data , 2019 .

[77]  Mark D. Robinson,et al.  treeclimbR pinpoints the data-dependent resolution of hierarchical hypotheses , 2020, Genome Biology.

[78]  Gabriel E. Hoffman,et al.  dream: Powerful differential expression analysis for repeated measures designs , 2018, bioRxiv.