Deep generative model embedding of single-cell RNA-Seq profiles on hyperspheres and hyperbolic spaces

Single-cell RNA-Seq (scRNA-seq) has become an invaluable tool for studying biological systems in health and diseases. While dimensionality reduction is a crucial step in interpreting the relation between cells based on scRNA-seq, current methods often are hampered by “crowding” of cells in the center of the latent space, biased by batch effects, or inadequately capture developmental relationships. Here, we introduced scPhere, a scalable deep generative model to embed cells into low-dimensional hyperspherical or hyperbolic spaces, as a more accurate representation of the data. ScPhere resolves cell crowding, corrects multiple, complex batch factors, facilitates interactive visualization of large datasets, and gracefully uncovers pseudotemporal trajectories. We demonstrate scPhere on six large datasets in complex tissue from human patients or animal development, demonstrating how it controls for both technical and biological factors and highlights complex cellular relations and biological insights.

[1]  Mengle Shao,et al.  Identification of functionally distinct fibro-inflammatory and adipogenic stromal subpopulations in visceral adipose tissue of adult mice , 2018, eLife.

[2]  Rafael A. Irizarry,et al.  Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model , 2019, Genome Biology.

[3]  Michael I. Jordan,et al.  Deep Generative Modeling for Single-cell Transcriptomics , 2018, Nature Methods.

[4]  David M. Miller,et al.  Handbook of Mathematical Functions With Formulas, Graphs and Mathematical Tables (National Bureau of Standards Applied Mathematics Series No. 55) , 1965 .

[5]  Anne Condon,et al.  Interpretable dimensionality reduction of single cell transcriptome data with deep generative models , 2017, Nature Communications.

[6]  David van Dijk,et al.  Visualizing Structure and Transitions for Biological Data Exploration , 2017, bioRxiv.

[7]  Shoichiro Yamaguchi,et al.  A Wrapped Normal Distribution on Hyperbolic Space for Gradient-Based Learning , 2019, ICML.

[8]  Fan Zhang,et al.  Fast, sensitive, and accurate integration of single cell data with Harmony , 2018, bioRxiv.

[9]  Percy Liang,et al.  Generating Sentences by Editing Prototypes , 2017, TACL.

[10]  Laleh Haghverdi,et al.  Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors , 2018, Nature Biotechnology.

[11]  Walter Zucchini,et al.  RGL : A R-library for 3 D visualization with OpenGL , 2003 .

[12]  Eric J. Deeds,et al.  A novel metric reveals previously unrecognized distortion in dimensionality reduction of scRNA-seq data , 2019, bioRxiv.

[13]  Kanti V. Mardia,et al.  Bayesian inference for the von Mises-Fisher distribution , 1976 .

[14]  M. Abramowitz,et al.  Handbook of Mathematical Functions With Formulas, Graphs and Mathematical Tables (National Bureau of Standards Applied Mathematics Series No. 55) , 1965 .

[15]  E. Arredondo,et al.  Council of Europe Black Sea Area Project: International Cooperation for the Development of Activities Related to Donation and Transplantation of Organs in the Region. , 2018, Transplantation proceedings.

[16]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[17]  Anne E Carpenter,et al.  Opportunities and obstacles for deep learning in biology and medicine , 2017, bioRxiv.

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Michael I. Jordan,et al.  Probabilistic harmonization and annotation of single‐cell transcriptomics data with deep generative models , 2019, bioRxiv.

[20]  A. Wood Simulation of the von mises fisher distribution , 1994 .

[21]  Juan Carlos Fernández,et al.  Multiobjective evolutionary algorithms to identify highly autocorrelated areas: the case of spatial distribution in financially compromised farms , 2014, Ann. Oper. Res..

[22]  Jonathan P. How,et al.  Small-variance nonparametric clustering on the hypersphere , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Eric Vivier,et al.  High-Dimensional Single-Cell Analysis Identifies Organ-Specific Signatures and Conserved NK Cell Subsets in Humans and Mice , 2018, Immunity.

[24]  David Lopez-Paz,et al.  Poincaré maps for analyzing complex hierarchies in single-cell data , 2019, Nature Communications.

[25]  Jonathan S. Packer,et al.  A lineage-resolved molecular atlas of C. elegans embryogenesis at single-cell resolution , 2019, Science.

[26]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[27]  Fabian J Theis,et al.  The Human Cell Atlas , 2017, bioRxiv.

[28]  Gary Ulrich,et al.  Computer Generation of Distributions on the M‐Sphere , 1984 .

[29]  Fabian J Theis,et al.  Current best practices in single‐cell RNA‐seq analysis: a tutorial , 2019, Molecular systems biology.

[30]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[31]  Douwe Kiela,et al.  Poincaré Embeddings for Learning Hierarchical Representations , 2017, NIPS.

[32]  Sean C. Bendall,et al.  Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with Prognosis , 2015, Cell.

[33]  Evan Z. Macosko,et al.  Slide-seq: A scalable technology for measuring genome-wide expression at high spatial resolution , 2019, Science.

[34]  Kurt Hornik,et al.  movMF: An R Package for Fitting Mixtures of von Mises-Fisher Distributions , 2014 .

[35]  A. Regev,et al.  Revealing the vectors of cellular identity with single-cell genomics , 2016, Nature Biotechnology.

[36]  H. Swerdlow,et al.  Large-scale simultaneous measurement of epitopes and transcriptomes in single cells , 2017, Nature Methods.

[37]  Christoph Ziegenhain,et al.  powsimR: Power analysis for bulk and single cell RNA-seq experiments , 2017, bioRxiv.

[38]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[39]  Nicola De Cao,et al.  Hyperspherical Variational Auto-Encoders , 2018, UAI 2018.

[40]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[41]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[42]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[43]  Aviv Regev,et al.  Intra- and Inter-cellular Rewiring of the Human Colon during Ulcerative Colitis , 2019, Cell.

[44]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[45]  Gabriele Steidl,et al.  Priors with Coupled First and Second Order Differences for Manifold-Valued Image Processing , 2017, Journal of Mathematical Imaging and Vision.

[46]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[47]  Martin J. Aryee,et al.  Droplet-based combinatorial indexing for massive-scale single-cell chromatin accessibility , 2019, Nature Biotechnology.

[48]  Casper Kaae Sønderby,et al.  scVAE: Variational auto-encoders for single-cell gene expression data , 2018, bioRxiv.

[49]  M. Hemberg,et al.  scmap: projection of single-cell RNA-seq data across data sets , 2018, Nature Methods.

[50]  E. Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[51]  Jiacheng Xu,et al.  Spherical Latent Spaces for Stable Variational Autoencoders , 2018, EMNLP.

[52]  S. Teichmann,et al.  Computational and analytical challenges in single-cell transcriptomics , 2015, Nature Reviews Genetics.

[53]  Fabian J Theis,et al.  Single-cell RNA-seq denoising using a deep count autoencoder , 2019, Nature Communications.

[54]  Fabian J Theis,et al.  A cellular census of human lungs identifies novel cell states in health and in asthma , 2019, Nature Medicine.

[55]  Howard Y. Chang,et al.  Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion , 2019, bioRxiv.

[56]  Sean C. Bendall,et al.  Single-Cell Trajectory Detection Uncovers Progression and Regulatory Coordination in Human B Cell Development , 2014, Cell.

[57]  Richard Bonneau,et al.  High-definition spatial transcriptomics for in situ tissue profiling , 2019, Nature Methods.

[58]  Valentine Svensson,et al.  Droplet scRNA-seq is not zero-inflated , 2019, Nature Biotechnology.

[59]  Mohammad Lotfollahi,et al.  scGen predicts single-cell perturbation responses , 2019, Nature Methods.

[60]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[61]  Kieran R. Campbell,et al.  Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling , 2019, Nature Methods.

[62]  R. Satija,et al.  Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression , 2019, Genome Biology.

[63]  Douwe Kiela,et al.  Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry , 2018, ICML.

[64]  Lorenzo Livi,et al.  Adversarial Autoencoders with Constant-Curvature Latent Manifolds , 2019, Appl. Soft Comput..

[65]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[66]  Yee Whye Teh,et al.  Hierarchical Representations with Poincaré Variational Auto-Encoders , 2019, ArXiv.

[67]  David van Dijk,et al.  Visualizing Structure and Transitions for Biological Data Exploration , 2018 .

[68]  Jin Gu,et al.  VASC: Dimension Reduction and Visualization of Single-cell RNA-seq Data by Deep Variational Autoencoder , 2018, Genom. Proteom. Bioinform..

[69]  Achim Zeileis,et al.  Flexible Generation of E-Learning Exams in R: Moodle Quizzes, OLAT Assessments, and Beyond , 2014 .

[70]  Milton Abramowitz,et al.  Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[71]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.