A joint model of unpaired data from scRNA-seq and spatial transcriptomics for imputing missing gene expression measurements

Spatial studies of transcriptome provide biologists with gene expression maps of heterogeneous and complex tissues. However, most experimental protocols for spatial transcriptomics suffer from the need to select beforehand a small fraction of genes to be quantified over the entire transcriptome. Standard single-cell RNA sequencing (scRNA-seq) is more prevalent, easier to implement and can in principle capture any gene but cannot recover the spatial location of the cells. In this manuscript, we focus on the problem of imputation of missing genes in spatial transcriptomic data based on (unpaired) standard scRNA-seq data from the same biological tissue. Building upon domain adaptation work, we propose gimVI, a deep generative model for the integration of spatial transcriptomic data and scRNA-seq data that can be used to impute missing genes. After describing our generative model and an inference procedure for it, we compare gimVI to alternative methods from computational biology or domain adaptation on real datasets and outperform Seurat Anchors, Liger and CORAL to impute held-out genes.

[1]  Koby Crammer,et al.  Learning Bounds for Domain Adaptation , 2007, NIPS.

[2]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[3]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[4]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[5]  A. Regev,et al.  Spatial reconstruction of single-cell gene expression data , 2015 .

[6]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[7]  L. Cai,et al.  In Situ Transcription Profiling of Single Cells Reveals Spatial Organization of Cells in the Mouse Hippocampus , 2016, Neuron.

[8]  Kate Saenko,et al.  Return of Frustratingly Easy Domain Adaptation , 2015, AAAI.

[9]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[10]  Smita Krishnaswamy,et al.  MAGAN: Aligning Biological Manifolds , 2018, ICML.

[11]  Lars E. Borm,et al.  Spatial organization of the somatosensory cortex revealed by osmFISH , 2018, Nature Methods.

[12]  Laleh Haghverdi,et al.  Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors , 2018, Nature Biotechnology.

[13]  Evan Z. Macosko,et al.  A Single-Cell Atlas of Cell Types, States, and Other Transcriptional Patterns from Nine Regions of the Adult Mouse Brain , 2018, bioRxiv.

[14]  Lars E. Borm,et al.  Molecular Architecture of the Mouse Nervous System , 2018, Cell.

[15]  Guocheng Yuan,et al.  Identification of spatially associated subpopulations by combining scRNA-seq and sequential fluorescence in situ hybridization data , 2018, Nature Biotechnology.

[16]  William E. Allen,et al.  Three-dimensional intact-tissue sequencing of single-cell transcriptional states , 2018, Science.

[17]  Christoph Hafemeister,et al.  Comprehensive integration of single cell data , 2018, bioRxiv.

[18]  Michael I. Jordan,et al.  Deep Generative Modeling for Single-cell Transcriptomics , 2018, Nature Methods.

[19]  S. Teichmann,et al.  SpatialDE: identification of spatially variable genes , 2018, Nature Methods.

[20]  Evan Z. Macosko,et al.  Integrative inference of brain cell similarities and differences from single-cell genomics , 2018, bioRxiv.

[21]  Valentine Svensson,et al.  Droplet scRNA-seq is not zero-inflated , 2019, Nature Biotechnology.

[22]  Evan Z. Macosko,et al.  Slide-seq: A scalable technology for measuring genome-wide expression at high spatial resolution , 2019, Science.