Dimensionality reduction and visualization of single-cell RNA-seq data with an improved deep variational autoencoder

Single-cell RNA sequencing (scRNA-seq) is a revolutionary breakthrough that determines the precise gene expressions on individual cells and deciphers cell heterogeneity and subpopulations. However, scRNA-seq data are much noisier than traditional high-throughput RNA-seq data because of technical limitations, leading to many scRNA-seq data studies about dimensionality reduction and visualization remaining at the basic data-stacking stage. In this study, we propose an improved variational autoencoder model (termed DREAM) for dimensionality reduction and a visual analysis of scRNA-seq data. Here, DREAM combines the variational autoencoder and Gaussian mixture model for cell type identification, meanwhile explicitly solving 'dropout' events by introducing the zero-inflated layer to obtain the low-dimensional representation that describes the changes in the original scRNA-seq dataset. Benchmarking comparisons across nine scRNA-seq datasets show that DREAM outperforms four state-of-the-art methods on average. Moreover, we prove that DREAM can accurately capture the expression dynamics of human preimplantation embryonic development. DREAM is implemented in Python, freely available via the GitHub website, https://github.com/Crystal-JJ/DREAM.

[1]  R. Nussinov,et al.  Graph embedding and Gaussian mixture variational autoencoder network for end-to-end analysis of single-cell RNA sequencing data , 2023, Cell reports methods.

[2]  Anjun Ma,et al.  The use of single-cell multi-omics in immuno-oncology , 2022, Nature Communications.

[3]  R. Nussinov,et al.  Accurate prediction of molecular targets using a self-supervised image representation learning framework , 2022, Research square.

[4]  Xue Liang,et al.  Single‐cell RNA sequencing technologies and applications: A brief overview , 2022, Clinical and translational medicine.

[5]  Qin Ma,et al.  Deep learning shapes single-cell data analysis , 2022, Nature Reviews Molecular Cell Biology.

[6]  Philip S. Yu,et al.  Deep learning for drug repurposing: Methods, databases, and applications , 2022, WIREs Computational Molecular Science.

[7]  Xiangxiang Zeng,et al.  Toward better drug discovery with knowledge graph. , 2021, Current opinion in structural biology.

[8]  Xiangxiang Zeng,et al.  Deep learning in retrosynthesis planning: datasets, models and tools , 2021, Briefings Bioinform..

[9]  Junhyong Kim,et al.  Multi-omics integration in the age of million single-cell data , 2021, Nature Reviews Nephrology.

[10]  Lingling Zhao,et al.  Critical downstream analysis steps for single-cell RNA sequencing data , 2021, Briefings Bioinform..

[11]  Lijun Cai,et al.  CMF-Impute: an accurate imputation tool for single cell RNA-seq data , 2020, Bioinform..

[12]  Z. Zeng,et al.  Single‐cell RNA sequencing in cancer research , 2021, Journal of Experimental & Clinical Cancer Research.

[13]  I. Varela,et al.  Tumor Functional Heterogeneity Unraveled by scRNA-seq Technologies: (Trends in Cancer 6, 13-19, 2020). , 2021, Trends in cancer.

[14]  B. Kamińska,et al.  Single-cell RNA sequencing reveals functional heterogeneity of glioma-associated brain macrophages , 2021, Nature Communications.

[15]  Qin Ma,et al.  Integrative Methods and Practical Challenges for Single-Cell Multi-omics. , 2020, Trends in biotechnology.

[16]  Quan Zou,et al.  Clustering and classification methods for single-cell RNA-sequencing data , 2020, Briefings Bioinform..

[17]  Q. Zou,et al.  Identifying cell types to interpret scRNA-seq data: how, why and more possibilities. , 2020, Briefings in functional genomics.

[18]  Howard Y. Chang,et al.  Single-cell RNA sequencing in cardiovascular development, disease and medicine , 2020, Nature Reviews Cardiology.

[19]  Zhigang Zhang,et al.  scIGANs: single-cell RNA-seq imputation using generative adversarial networks , 2020, bioRxiv.

[20]  Hong Yan,et al.  EnImpute: imputing dropout events in single-cell RNA-sequencing data via ensemble learning , 2019, Bioinform..

[21]  Tao Jiang,et al.  SCALE method for single-cell ATAC-seq analysis via latent feature extraction , 2019, Nature Communications.

[22]  M. Tosolini,et al.  Single-Cell Signature Explorer for comprehensive visualization of single cell signatures across scRNA-seq datasets , 2019, bioRxiv.

[23]  Philip Lijnzaad,et al.  CHETAH: a selective, hierarchical cell type identification method for single-cell RNA sequencing , 2019, bioRxiv.

[24]  Xiang Zhou,et al.  VIPER: variability-preserving imputation for accurate gene expression recovery in single-cell RNA sequencing studies , 2018, Genome Biology.

[25]  A. Majumdar,et al.  AutoImpute: Autoencoder based imputation of single-cell RNA-seq data , 2018, Scientific Reports.

[26]  Jin Gu,et al.  VASC: Dimension Reduction and Visualization of Single-cell RNA-seq Data by Deep Variational Autoencoder , 2018, Genom. Proteom. Bioinform..

[27]  Jean Yee Hwa Yang,et al.  Impact of similarity metrics on single-cell RNA-seq data clustering , 2018, Briefings Bioinform..

[28]  Casper Kaae Sønderby,et al.  scVAE: Variational auto-encoders for single-cell gene expression data , 2018, bioRxiv.

[29]  Jinzhou Yuan,et al.  Single-Cell Transcriptomic Analysis of Tumor Heterogeneity. , 2018, Trends in cancer.

[30]  Wei Vivian Li,et al.  An accurate and robust imputation method scImpute for single-cell RNA-seq data , 2018, Nature Communications.

[31]  Il-Youp Kwak,et al.  DrImpute: imputing dropout events in single cell RNA sequencing data , 2017, bioRxiv.

[32]  S. Teichmann,et al.  Exponential scaling of single-cell RNA-seq in the past decade , 2017, Nature Protocols.

[33]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[34]  Aaron T. L. Lun,et al.  Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R , 2017, Bioinform..

[35]  Pang Wei Koh,et al.  Mapping the Pairwise Choices Leading from Pluripotency to Human Bone, Heart, and Other Mesoderm Cell Types , 2016, Cell.

[36]  Alvaro Plaza Reyes,et al.  Single-Cell RNA-Seq Reveals Lineage and X Chromosome Dynamics in Human Preimplantation Embryos , 2016, Cell.

[37]  Rhonda Bacher,et al.  Design and computational analysis of single-cell RNA-sequencing experiments , 2016, Genome Biology.

[38]  E. Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[39]  Guo-Cheng Yuan,et al.  Single-Cell Analysis in Cancer Genomics. , 2015, Trends in genetics : TIG.

[40]  Aleksandra A. Kolodziejczyk,et al.  Single Cell RNA-Sequencing of Pluripotent States Unlocks Modular Transcriptional Variation , 2015, Cell stem cell.

[41]  S. Teichmann,et al.  Computational and analytical challenges in single-cell transcriptomics , 2015, Nature Reviews Genetics.

[42]  Fabian J Theis,et al.  Decoding the Regulatory Network for Blood Development from Single-Cell Gene Expression Measurements , 2015, Nature Biotechnology.

[43]  S. Linnarsson,et al.  Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing , 2014, Nature Neuroscience.

[44]  Ben S. Wittner,et al.  Single-Cell RNA Sequencing Identifies Extracellular Matrix Gene Expression by Pancreatic Circulating Tumor Cells , 2014, Cell reports.

[45]  F. Biase,et al.  Cell fate inclination within 2-cell and 4-cell mouse embryos revealed by single-cell RNA sequencing , 2014, Genome research.

[46]  Alex A. Pollen,et al.  Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex , 2014, Nature Biotechnology.

[47]  Shawn M. Gillespie,et al.  Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma , 2014, Science.

[48]  Max Welling,et al.  Efficient Gradient-Based Inference through Transformations between Bayes Nets and Neural Nets , 2014, ICML.

[49]  Ruiqiang Li,et al.  Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells , 2013, Nature Structural &Molecular Biology.

[50]  E. Shapiro,et al.  Single-cell sequencing-based technologies will revolutionize whole-organism science , 2013, Nature Reviews Genetics.

[51]  S. Linnarsson,et al.  Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. , 2011, Genome research.

[52]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[53]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[54]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[55]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[56]  OUP accepted manuscript , 2021, Briefings In Bioinformatics.

[57]  Michael Morse,et al.  Monocle : Cell counting , differential expression , and trajectory analysis for single-cell RNA-Seq experiments , 2016 .

[58]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[59]  L. Hubert,et al.  Comparing partitions , 1985 .