A deep generative model for single-cell RNA sequencing with application to detecting differentially expressed genes

We propose a probabilistic model for interpreting gene expression levels that are observed through single-cell RNA sequencing. In the model, each cell has a low-dimensional latent representation. Additional latent variables account for technical effects that may erroneously set some observations of gene expression levels to zero. Conditional distributions are specified by neural networks, giving the proposed model enough flexibility to fit the data well. We use variational inference and stochastic optimization to approximate the posterior distribution. The inference procedure scales to over one million cells, whereas competing algorithms do not. Even for smaller datasets, for several tasks, the proposed procedure outperforms state-of-the-art methods like ZIFA and ZINB-WaVE. We also extend our framework to take into account batch effects and other confounding factors and propose a natural Bayesian hypothesis framework for differential expression that outperforms tradition DESeq2.

[1]  Jean-Philippe Vert,et al.  ZINB-WaVE: A general and flexible method for signal extraction from single-cell RNA-seq data , 2017, bioRxiv.

[2]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[3]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[4]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[5]  Kevin R. Moon,et al.  MAGIC: A diffusion-based imputation method reveals gene-gene interactions in single-cell RNA-sequencing data , 2017, bioRxiv.

[6]  Emma Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[7]  Anne Condon,et al.  Interpretable dimensionality reduction of single cell transcriptome data with deep generative models , 2017 .

[8]  Sandrine Dudoit,et al.  Normalizing single-cell RNA sequencing data: challenges and opportunities , 2017, Nature Methods.

[9]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, bioRxiv.

[10]  Sandhya Prabhakaran,et al.  Dirichlet Process Mixture Model for Correcting Technical Variation in Single-Cell Gene Expression Data , 2016, ICML.

[11]  Michael J. T. Stubbington,et al.  Single-cell transcriptomics to explore the immune system in health and disease , 2017, Science.

[12]  Nir Yosef,et al.  FastProject: A Tool for Low-Dimensional Analysis of Single-Cell RNA-Seq Data , 2016 .

[13]  Stefano Ermon,et al.  InfoVAE: Balancing Learning and Inference in Variational Autoencoders , 2019, AAAI.

[14]  Peter J. Bickel,et al.  Measuring reproducibility of high-throughput experiments , 2011, 1110.4705.

[15]  A. Regev,et al.  Revealing the vectors of cellular identity with single-cell genomics , 2016, Nature Biotechnology.

[16]  A. Regev,et al.  Scaling single-cell genomics from phenomenology to mechanism , 2017, Nature.

[17]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[18]  S. Dudoit,et al.  Normalization of RNA-seq data using factor analysis of control genes or samples , 2014, Nature Biotechnology.

[19]  A. Oudenaarden,et al.  Validation of noise models for single-cell transcriptomics , 2014, Nature Methods.

[20]  Hongkui Zeng,et al.  Neuronal cell-type classification: challenges, opportunities and the path forward , 2017, Nature Reviews Neuroscience.

[21]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[22]  Z. Bar-Joseph,et al.  Using neural networks for reducing the dimensions of single-cell RNA-Seq data , 2017, Nucleic acids research.