Interpretable dimensionality reduction of single cell transcriptome data with deep generative models

Single-cell RNA-sequencing has great potential to discover cell types, identify cell states, trace development lineages, and reconstruct the spatial organization of cells. However, dimension reduction to interpret structure in single-cell sequencing data remains a challenge. Existing algorithms are either not able to uncover the clustering structures in the data or lose global information such as groups of clusters that are close to each other. We present a robust statistical model, scvis, to capture and visualize the low-dimensional structures in single-cell gene expression data. Simulation results demonstrate that low-dimensional representations learned by scvis preserve both the local and global neighbor structures in the data. In addition, scvis is robust to the number of data points and learns a probabilistic parametric mapping function to add new data points to an existing embedding. We then use scvis to analyze four single-cell RNA-sequencing datasets, exemplifying interpretable two-dimensional representations of the high-dimensional single-cell RNA-sequencing data.Although single-cell transcriptome data are increasingly available, their interpretation remains a challenge. Here, the authors present a dimensionality reduction approach that preserves both the local and global neighbourhood structures in the data thus enhancing its interpretability.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[3]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[4]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[5]  Neil D. Lawrence,et al.  Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data , 2003, NIPS.

[6]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[7]  Geoffrey E. Hinton,et al.  Visualizing Similarity Data with a Mixture of Maps , 2007, AISTATS.

[8]  J. Davis Bioinformatics and Computational Biology Solutions Using R and Bioconductor , 2007 .

[9]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[10]  Laurens van der Maaten,et al.  Learning a Parametric Embedding by Preserving Local Structure , 2009, AISTATS.

[11]  Miguel Á. Carreira-Perpiñán,et al.  The Elastic Embedding Algorithm for Dimensionality Reduction , 2010, ICML.

[12]  J. Troge,et al.  Tumour evolution inferred by single-cell sequencing , 2011, Nature.

[13]  Sean C. Bendall,et al.  Single-Cell Mass Cytometry of Differential Immune and Drug Responses Across a Human Hematopoietic Continuum , 2011, Science.

[14]  Günter P. Wagner,et al.  Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples , 2012, Theory in Biosciences.

[15]  Samuel Kaski,et al.  Scalable Optimization of Neighbor Embedding for Visualization , 2013, ICML.

[16]  Sean C. Bendall,et al.  viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia , 2013, Nature Biotechnology.

[17]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[18]  Gioele La Manno,et al.  Quantitative single-cell RNA-seq with unique molecular identifiers , 2013, Nature Methods.

[19]  Cole Trapnell,et al.  The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells , 2014, Nature Biotechnology.

[20]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[21]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[22]  I. Amit,et al.  Massively Parallel Single-Cell RNA-Seq for Marker-Free Decomposition of Tissues into Cell Types , 2014, Science.

[23]  Åsa K. Björklund,et al.  Full-length RNA-seq from single cells using Smart-seq2 , 2014, Nature Protocols.

[24]  Shawn M. Gillespie,et al.  Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma , 2014, Science.

[25]  Sean C. Bendall,et al.  Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with Prognosis , 2015, Cell.

[26]  C. Ponting,et al.  G&T-seq: parallel sequencing of single-cell genomes and transcriptomes , 2015, Nature Methods.

[27]  E. Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[28]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[29]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[30]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[31]  S. Teichmann,et al.  Computational and analytical challenges in single-cell transcriptomics , 2015, Nature Reviews Genetics.

[32]  Fabian J Theis,et al.  Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells , 2015, Nature Biotechnology.

[33]  Anne Condon,et al.  densityCut: an efficient and versatile topological approach for automatic clustering of biological data , 2016, Bioinform..

[34]  Mariella G. Filbin,et al.  Single-cell RNA-seq supports a developmental hierarchy in human oligodendroglioma , 2016, Nature.

[35]  Martin Wattenberg,et al.  How to Use t-SNE Effectively , 2016 .

[36]  A. Regev,et al.  Revealing the vectors of cellular identity with single-cell genomics , 2016, Nature Biotechnology.

[37]  Christopher Yau,et al.  pcaReduce: hierarchical clustering of single cell transcriptional profiles , 2015, BMC Bioinformatics.

[38]  Fabian J. Theis,et al.  destiny: diffusion maps for large-scale single-cell data in R , 2015, Bioinform..

[39]  Evan Z. Macosko,et al.  Comprehensive Classification of Retinal Bipolar Neurons by Single-Cell Transcriptomics , 2016, Cell.

[40]  Nir Yosef,et al.  FastProject: a tool for low-dimensional analysis of single-cell RNA-Seq data , 2016, BMC Bioinformatics.

[41]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[42]  Charles H. Yoon,et al.  Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq , 2016, Science.

[43]  Sean C. Bendall,et al.  Wishbone identifies bifurcating developmental trajectories from single-cell data , 2016, Nature Biotechnology.

[44]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[45]  Shuqiang Li,et al.  CEL-Seq2: sensitive highly-multiplexed single-cell RNA-Seq , 2016, Genome Biology.

[46]  M. Newton,et al.  SCnorm: robust normalization of single-cell RNA-seq data , 2017, Nature Methods.

[47]  Richard A. Muscat,et al.  Scaling single cell transcriptomics through split pool barcoding , 2017, bioRxiv.

[48]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[49]  J. C. Love,et al.  Seq-Well: A Portable, Low-Cost Platform for High-Throughput Single-Cell RNA-Seq of Low-Input Samples , 2017, Nature Methods.

[50]  Andrew C. Adey,et al.  Single-Cell Transcriptional Profiling of a Multicellular Organism , 2017 .

[51]  Andrew C. Adey,et al.  Comprehensive single cell transcriptional profiling of a multicellular organism by combinatorial indexing , 2017 .

[52]  Bo Wang,et al.  Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning , 2016, Nature Methods.

[53]  Valentine Svensson,et al.  Power Analysis of Single Cell RNA-Sequencing Experiments , 2016, Nature Methods.

[54]  Fabian J Theis,et al.  The Human Cell Atlas , 2017, bioRxiv.

[55]  Kieran R. Campbell,et al.  Probabilistic modeling of bifurcations in single-cell gene expression data using a Bayesian mixture of factor analyzers , 2017, Wellcome open research.

[56]  Yi Yao,et al.  Gating mass cytometry data by deep learning , 2016, bioRxiv.

[57]  Andrew J. Hill,et al.  Single-cell mRNA quantification and differential analysis with Census , 2017, Nature Methods.

[58]  I. Hellmann,et al.  Comparative Analysis of Single-Cell RNA Sequencing Methods , 2016, bioRxiv.

[59]  Sandrine Dudoit,et al.  Normalizing single-cell RNA sequencing data: challenges and opportunities , 2017, Nature Methods.

[60]  Richard A. Muscat,et al.  Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding , 2018, Science.

[61]  Russell B. Fletcher,et al.  Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics , 2017, BMC Genomics.