Deep Generative Modeling for Single-cell Transcriptomics

Single-cell transcriptome measurements can reveal unexplored biological diversity, but they suffer from technical noise and bias that must be modeled to account for the resulting uncertainty in downstream analyses. Here we introduce single-cell variational inference (scVI), a ready-to-use scalable framework for the probabilistic representation and analysis of gene expression in single cells (https://github.com/YosefLab/scVI). scVI uses stochastic optimization and deep neural networks to aggregate information across similar cells and genes and to approximate the distributions that underlie observed expression values, while accounting for batch effects and limited sensitivity. We used scVI for a range of fundamental analysis tasks including batch correction, visualization, clustering, and differential expression, and achieved high accuracy for each task.scVI is a ready-to-use generative deep learning tool for large-scale single-cell RNA-seq data that enables raw data processing and a wide range of rapid and accurate downstream analyses.

[1]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[2]  J. Gribben,et al.  Chronic lymphocytic leukemia cells induce changes in gene expression of CD4 and CD8 T cells. , 2005, The Journal of clinical investigation.

[3]  Andrew Gelman,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2006 .

[4]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[5]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[6]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[7]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[8]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[9]  Peter J. Bickel,et al.  Measuring reproducibility of high-throughput experiments , 2011, 1110.4705.

[10]  Eva K. Lee,et al.  Systems Biology of Seasonal Influenza Vaccination in Humans , 2011, Nature Immunology.

[11]  P. Kharchenko,et al.  Bayesian approach to single-cell differential expression analysis , 2014, Nature Methods.

[12]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[13]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[14]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[15]  A. Oudenaarden,et al.  Validation of noise models for single-cell transcriptomics , 2014, Nature Methods.

[16]  G. Andrew,et al.  arm: Data Analysis Using Regression and Multilevel/Hierarchical Models , 2014 .

[17]  Shawn M. Gillespie,et al.  Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma , 2014, Science.

[18]  P. Linsley,et al.  MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data , 2015, Genome Biology.

[19]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[20]  E. Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[21]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[22]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[23]  Rona S. Gertner,et al.  Single-Cell Genomics Unveils Critical Regulators of Th17 Cell Pathogenicity , 2015, Cell.

[24]  Catalina A. Vallejos,et al.  BASiCS: Bayesian Analysis of Single-Cell Sequencing Data , 2015, PLoS Comput. Biol..

[25]  Ole Winther,et al.  Ladder Variational Autoencoders , 2016, NIPS.

[26]  A. Regev,et al.  Revealing the vectors of cellular identity with single-cell genomics , 2016, Nature Biotechnology.

[27]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[28]  Joseph L. Herman,et al.  Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis , 2015, Nature Methods.

[29]  Sandhya Prabhakaran,et al.  Dirichlet Process Mixture Model for Correcting Technical Variation in Single-Cell Gene Expression Data , 2016, ICML.

[30]  Evan Z. Macosko,et al.  Comprehensive Classification of Retinal Bipolar Neurons by Single-Cell Transcriptomics , 2016, Cell.

[31]  Nir Yosef,et al.  FastProject: a tool for low-dimensional analysis of single-cell RNA-Seq data , 2016, BMC Bioinformatics.

[32]  Max Welling,et al.  The Variational Fair Autoencoder , 2015, ICLR.

[33]  M. Newton,et al.  SCnorm: robust normalization of single-cell RNA-seq data , 2017, Nature Methods.

[34]  Sandrine Dudoit,et al.  Performance Assessment and Selection of Normalization Procedures for Single-Cell RNA-Seq , 2017 .

[35]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[36]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[37]  A. Regev,et al.  Scaling single-cell genomics from phenomenology to mechanism , 2017, Nature.

[38]  Dongfang Wang,et al.  VASC: dimension reduction and visualization of single cell RNA sequencing data by deep variational autoencoder , 2017, bioRxiv.

[39]  Bo Wang,et al.  Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning , 2016, Nature Methods.

[40]  Fabian J Theis,et al.  SCANPY: large-scale single-cell gene expression data analysis , 2018, Genome Biology.

[41]  Fabian J Theis,et al.  The Human Cell Atlas , 2017, bioRxiv.

[42]  Kevin R. Moon,et al.  MAGIC: A diffusion-based imputation method reveals gene-gene interactions in single-cell RNA-sequencing data , 2017, bioRxiv.

[43]  Jun Zhao,et al.  Removal of batch effects using distribution‐matching residual networks , 2016, Bioinform..

[44]  S. Dudoit,et al.  A general and flexible method for signal extraction from single-cell RNA-seq data , 2018, Nature Communications.

[45]  Andrew J. Hill,et al.  Single-cell mRNA quantification and differential analysis with Census , 2017, Nature Methods.

[46]  Z. Bar-Joseph,et al.  Using neural networks for reducing the dimensions of single-cell RNA-Seq data , 2017, Nucleic acids research.

[47]  Sandrine Dudoit,et al.  Normalizing single-cell RNA sequencing data: challenges and opportunities , 2017, Nature Methods.

[48]  H. Swerdlow,et al.  Large-scale simultaneous measurement of epitopes and transcriptomes in single cells , 2017, Nature Methods.

[49]  T. Mikkelsen,et al.  Dynamics of lineage commitment revealed by single-cell transcriptomics of differentiating embryonic stem cells , 2016, Nature Communications.

[50]  Anne Condon,et al.  Interpretable dimensionality reduction of single cell transcriptome data with deep generative models , 2017, Nature Communications.

[51]  L. Held,et al.  On p-Values and Bayes Factors , 2018 .

[52]  Laleh Haghverdi,et al.  Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors , 2018, Nature Biotechnology.

[53]  Samuel L. Wolock,et al.  Population Snapshots Predict Early Hematopoietic and Erythroid Hierarchies , 2018, Nature.

[54]  Fabian J. Theis,et al.  Single-cell RNA-seq denoising using a deep count autoencoder , 2018, Nature Communications.

[55]  Casper Kaae Sønderby,et al.  scVAE: Variational auto-encoders for single-cell gene expression data , 2018, bioRxiv.

[56]  J. Gu,et al.  Dimension reduction and visualization of single cell RNA sequencing data by deep variational autoencoder , 2019 .

[57]  Casper Kaae Sønderby,et al.  scVAE: variational auto-encoders for single-cell gene expression data , 2020, Bioinform..