Disentangling latent representations of single cell RNA-seq experiments

Single cell RNA sequencing (scRNA-seq) enables transcriptional profiling at the resolution of individual cells. These experiments measure features at the level of transcripts, but biological processes of interest often involve the complex coordination of many individual transcripts. It can therefore be difficult to extract interpretable insights directly from transcript-level cell profiles. Latent representations which capture biological variation in a smaller number of dimensions are therefore useful in interpreting many experiments. Variational autoencoders (VAEs) have emerged as a tool for scRNA-seq denoising and data harmonization, but the correspondence between latent dimensions in these models and generative factors remains unexplored. Here, we explore training VAEs with modifications to the objective function (i.e. β-VAE) to encourage disentanglement and make latent representations of single cell RNA-seq data more interpretable. Using simulated data, we find that VAE latent dimensions correspond more directly to data generative factors when using these modified objective functions. Applied to experimental data of stimulated peripheral blood mononuclear cells, we find better correspondence of latent dimensions to experimental factors and cell identity programs, but impaired performance on cell type clustering. Publication Status This pre-print represents the final output of a preliminary research direction and will not be updated or published in an archival journal. We are happy to discuss future directions we believe to be promising with any interested researchers.

[1]  Michael I. Jordan,et al.  Deep Generative Modeling for Single-cell Transcriptomics , 2018, Nature Methods.

[2]  Pardis C. Sabeti,et al.  Identifying Gene Expression Programs of Cell-type Identity and Cellular Activity with Single-Cell RNA-Seq , 2018 .

[3]  Andriy Mnih,et al.  Disentangling by Factorising , 2018, ICML.

[4]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[5]  Simon X. Chen,et al.  Emergence of reproducible spatiotemporal activity during motor learning , 2014, Nature.

[6]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[7]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Lai Guan Ng,et al.  Dimensionality reduction for visualizing single-cell data using UMAP , 2018, Nature Biotechnology.

[9]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[10]  E. Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[11]  Anne Condon,et al.  Interpretable dimensionality reduction of single cell transcriptome data with deep generative models , 2017, Nature Communications.

[12]  Rona S. Gertner,et al.  Single cell RNA Seq reveals dynamic paracrine control of cellular variation , 2014, Nature.

[13]  Guillaume Desjardins,et al.  Understanding disentangling in $\beta$-VAE , 2018, 1804.03599.

[14]  Fabian J Theis,et al.  Single-cell RNA-seq denoising using a deep count autoencoder , 2019, Nature Communications.

[15]  J. Aerts,et al.  SCENIC: Single-cell regulatory network inference and clustering , 2017, Nature Methods.

[16]  Chun Jimmie Ye,et al.  Multiplexed droplet single-cell RNA-sequencing using natural genetic variation , 2017, Nature Biotechnology.

[17]  Michael I. Jordan,et al.  Probabilistic harmonization and annotation of single‐cell transcriptomics data with deep generative models , 2019, bioRxiv.

[18]  Guillaume Desjardins,et al.  Understanding disentangling in β-VAE , 2018, ArXiv.

[19]  Roger B. Grosse,et al.  Isolating Sources of Disentanglement in Variational Autoencoders , 2018, NeurIPS.

[20]  Fabian J. Theis,et al.  destiny: diffusion maps for large-scale single-cell data in R , 2015, Bioinform..

[21]  Olivier Bachem,et al.  Recent Advances in Autoencoder-Based Representation Learning , 2018, ArXiv.

[22]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[23]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[24]  A. Oshlack,et al.  Splatter: simulation of single-cell RNA sequencing data , 2017, Genome Biology.