MAST: A flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA-seq data

Single-cell transcriptomic profiling enables the unprecedented interrogation of gene expression heterogeneity in rare cell populations that would otherwise be obscured in bulk RNA sequencing experiments. The stochastic nature of transcription is revealed in the bimodality of single-cell transcriptomic data, a feature shared across single-cell expression platforms. There is, however, a paucity of computational tools that take advantage of this unique characteristic. We present a new methodology to analyze single-cell transcriptomic data that models this bimodality within a coherent generalized linear modeling framework. We propose a two-part, generalized linear model that allows one to characterize biological changes in the proportions of cells that are expressing each gene, and in the positive mean expression level of that gene. We introduce the cellular detection rate, the fraction of genes turned on in a cell, and show how it can be used to simultaneously adjust for technical variation and so-called “extrinsic noise” at the single-cell level without the use of control genes. Our model permits direct inference on statistics formed by collections of genes, facilitating gene set enrichment analysis. The residuals defined by such models can be manipulated to interrogate cellular heterogeneity and gene-gene correlation across cells and conditions, providing insights into the temporal evolution of networks of co-expressed genes at the single-cell level. Using two single-cell RNA-seq datasets, including newly generated data from Mucosal Associated Invariant T (MAIT) cells, we show how model residuals can be used to identify significant changes across biologically relevant gene sets that are missed by other methods and characterize cellular heterogeneity in response to stimulation.

[1]  Robert D. Cousins,et al.  Annotated Bibliography of Some Papers on Combining Significances or p-values , 2007, 0705.2209.

[2]  Gordon K Smyth,et al.  Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2004, Statistical applications in genetics and molecular biology.

[3]  A. Hayes,et al.  Combining independent p values: extensions of the Stouffer and binomial methods. , 2000, Psychological methods.

[4]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[5]  Charity W. Law,et al.  voom: precision weights unlock linear model analysis tools for RNA-seq read counts , 2014, Genome Biology.

[6]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[7]  Scott A. Rifkin,et al.  Imaging individual mRNA molecules using multiple singly labeled probes , 2008, Nature Methods.

[8]  Aleksandra A. Kolodziejczyk,et al.  Accounting for technical noise in single-cell RNA-seq experiments , 2013, Nature Methods.

[9]  Andrew McDavid,et al.  Data exploration, quality control and testing in single-cell qPCR-based gene expression experiments , 2012, Bioinform..

[10]  Sandra Romero-Steiner,et al.  Molecular signatures of antibody responses derived from a systems biological study of 5 human vaccines , 2013, Nature Immunology.

[11]  Ido Golding,et al.  Genetic Determinants and Cellular Constraints in Noisy Gene Expression , 2013, Science.

[12]  Israel Steinfeld,et al.  BMC Bioinformatics BioMed Central , 2008 .

[13]  Jeffrey W. Smith,et al.  Stochastic Gene Expression in a Single Cell , 2022 .

[14]  S. Dudoit,et al.  Normalization of RNA-seq data using factor analysis of control genes or samples , 2014, Nature Biotechnology.

[15]  S. Ha,et al.  Egr2 induced during DC development acts as an intrinsic negative regulator of DC immunogenicity , 2013, European journal of immunology.

[16]  P. Kharchenko,et al.  Bayesian approach to single-cell differential expression analysis , 2014, Nature Methods.

[17]  A. Tyznik,et al.  Bystander-activated memory CD8 T cells control early pathogen load in an innate-like, NKG2D-dependent manner. , 2013, Cell reports.

[18]  B. Williams,et al.  From single-cell to cell-pool transcriptomes: Stochasticity in gene expression and RNA splicing , 2014, Genome research.

[19]  Judith A. Blake,et al.  The Mouse Genome Database (MGD): premier model organism resource for mammalian genomics and genetics , 2010, Nucleic Acids Res..

[20]  M. G. Pittau,et al.  A weakly informative default prior distribution for logistic and other regression models , 2008, 0901.4011.

[21]  Alexander van Oudenaarden,et al.  Stochastic Gene Expression: from Single Molecules to the Proteome This Review Comes from a Themed Issue on Chromosomes and Expression Mechanisms Edited Measuring Noise Mrna Fluctuations , 2022 .

[22]  R. Smeltz Profound Enhancement of the IL-12/IL-18 Pathway of IFN-γ Secretion in Human CD8+ Memory T Cell Subsets via IL-151 , 2007, The Journal of Immunology.

[23]  Fabian J Theis,et al.  Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells , 2015, Nature Biotechnology.

[24]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[25]  A. Raj,et al.  Single mammalian cells compensate for differences in cellular volume and DNA copy number through independent global transcriptional mechanisms. , 2015, Molecular cell.

[26]  R. Aebersold,et al.  Quantitative Analysis of Fission Yeast Transcriptomes and Proteomes in Proliferating and Quiescent Cells , 2012, Cell.

[27]  Rona S. Gertner,et al.  Single cell RNA Seq reveals dynamic paracrine control of cellular variation , 2014, Nature.

[28]  Interferon-gamma modulates the lipopolysaccharide-induced expression of AP-1 and NF-kappa B at the mRNA and protein level in human monocytes. , 1996, Experimental hematology.

[29]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[30]  Sandra Romero-Steiner,et al.  Molecular signatures of antibody responses derived from a systems biology study of five human vaccines , 2022 .

[31]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[32]  A. Tyznik,et al.  Distinct Requirements for Activation of NKT and NK Cells during Viral Infection , 2014, The Journal of Immunology.

[33]  G. Smyth,et al.  Camera: a competitive gene set test accounting for inter-gene correlation , 2012, Nucleic acids research.