scGMAAE: Gaussian mixture adversarial autoencoders for diversification analysis of scRNA-seq data

The progress of single-cell RNA sequencing (scRNA-seq) has led to a large number of scRNA-seq data, which are widely used in biomedical research. The noise in the raw data and tens of thousands of genes pose a challenge to capture the real structure and effective information of scRNA-seq data. Most of the existing single-cell analysis methods assume that the low-dimensional embedding of the raw data belongs to a Gaussian distribution or a low-dimensional nonlinear space without any prior information, which limits the flexibility and controllability of the model to a great extent. In addition, many existing methods need high computational cost, which makes them difficult to be used to deal with large-scale datasets. Here, we design and develop a depth generation model named Gaussian mixture adversarial autoencoders (scGMAAE), assuming that the low-dimensional embedding of different types of cells follows different Gaussian distributions, integrating Bayesian variational inference and adversarial training, as to give the interpretable latent representation of complex data and discover the statistical distribution of different types of cells. The scGMAAE is provided with good controllability, interpretability and scalability. Therefore, it can process large-scale datasets in a short time and give competitive results. scGMAAE outperforms existing methods in several ways, including dimensionality reduction visualization, cell clustering, differential expression analysis and batch effect removal. Importantly, compared with most deep learning methods, scGMAAE requires less iterations to generate the best results.

[1]  C. Zheng,et al.  scCDG: A Method Based on DAE and GCN for scRNA-Seq Data Analysis , 2021, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Hui Li,et al.  A universal deep neural network for in-depth cleaning of single-cell RNA-Seq data , 2020, Nature Communications.

[3]  OUP accepted manuscript , 2022, Briefings In Bioinformatics.

[4]  Sian Soo Tng,et al.  Improved Prediction Model of Protein Lysine Crotonylation Sites Using Bidirectional Recurrent Neural Networks. , 2021, Journal of proteome research.

[5]  N. Le Potential of deep representative learning features to interpret the sequence information in proteomics , 2021, Proteomics.

[6]  Hongwei Li,et al.  scINSIGHT for interpreting single-cell gene expression from biologically heterogeneous data , 2021, Genome Biology.

[7]  Wenfei Jin,et al.  A topology-preserving dimensionality reduction method for single-cell RNA-seq data using graph autoencoder , 2021, Scientific Reports.

[8]  Jianzhu Ma,et al.  Modeling gene regulatory networks using neural network architectures , 2021, Nature Computational Science.

[9]  H. Hakonarson,et al.  Model-based deep embedding for constrained clustering analysis of single cell RNA-seq data , 2021, Nature Communications.

[10]  C. Zheng,et al.  SUSCC: Secondary Construction of Feature Space based on UMAP for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data , 2021, Interdisciplinary Sciences: Computational Life Sciences.

[11]  Qiao Liu,et al.  Simultaneous deep generative modelling and clustering of single-cell genomic data , 2020, Nature Machine Intelligence.

[12]  Hung Nguyen,et al.  Fast and precise single-cell data analysis using hierarchical autoencoder , 2019, bioRxiv.

[13]  Yuedong Yang,et al.  Accurately Clustering Single-cell RNA-seq data by Capturing Structural Relations between Cells through Graph Convolutional Network , 2020, bioRxiv.

[14]  Ji Wan,et al.  Clustering single-cell RNA-seq data with a model-based deep learning approach , 2019, Nature Machine Intelligence.

[15]  Fabian J Theis,et al.  Single-cell RNA-seq denoising using a deep count autoencoder , 2019, Nature Communications.

[16]  Lai Guan Ng,et al.  Dimensionality reduction for visualizing single-cell data using UMAP , 2018, Nature Biotechnology.

[17]  Yves Moreau,et al.  GRNBoost2 and Arboreto: efficient and scalable inference of gene regulatory networks , 2018, Bioinform..

[18]  Casper Kaae Sønderby,et al.  scVAE: Variational auto-encoders for single-cell gene expression data , 2018, bioRxiv.

[19]  Michael I. Jordan,et al.  Deep Generative Modeling for Single-cell Transcriptomics , 2018, Nature Methods.

[20]  Rolf Backofen,et al.  GraphDDP: a graph-embedding approach to detect differentiation pathways in single-cell-data using prior class knowledge , 2018, Nature Communications.

[21]  Aviv Regev,et al.  A revised airway epithelial hierarchy includes CFTR-expressing ionocytes , 2018, Nature.

[22]  R. Satija,et al.  Single-cell RNA sequencing to explore immune cell heterogeneity , 2017, Nature Reviews Immunology.

[23]  Fabian J Theis,et al.  SCANPY: large-scale single-cell gene expression data analysis , 2018, Genome Biology.

[24]  Shawn M. Gillespie,et al.  Single-Cell Transcriptomic Analysis of Primary and Metastatic Tumor Ecosystems in Head and Neck Cancer , 2017, Cell.

[25]  Michael J. T. Stubbington,et al.  Single-cell transcriptomics to explore the immune system in health and disease , 2017, Science.

[26]  A. Oshlack,et al.  Splatter: simulation of single-cell RNA sequencing data , 2017, bioRxiv.

[27]  I. Amit,et al.  A Unique Microglia Type Associated with Restricting Development of Alzheimer’s Disease , 2017, Cell.

[28]  Thalia E. Chan,et al.  Gene Regulatory Network Inference from Single-Cell Data Using Multivariate Information Measures , 2016, bioRxiv.

[29]  Russell B. Fletcher,et al.  Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics , 2017, bioRxiv.

[30]  Yi Zhang,et al.  Single-Cell RNA-Seq Reveals Hypothalamic Cell Diversity. , 2017, Cell reports.

[31]  Yuchio Yanagawa,et al.  Molecular interrogation of hypothalamic organization reveals distinct dopamine neuronal subtypes , 2016, Nature Neuroscience.

[32]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[33]  Bo Wang,et al.  Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning , 2016, Nature Methods.

[34]  Rudiyanto Gunawan,et al.  SINCERITIES: inferring gene regulatory networks from time-stamped single cell transcriptional expression profiles , 2016, bioRxiv.

[35]  Murray Shanahan,et al.  Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders , 2016, ArXiv.

[36]  Samuel L. Wolock,et al.  A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure. , 2016, Cell systems.

[37]  A. Murphy,et al.  RNA Sequencing of Single Human Islet Cells Reveals Type 2 Diabetes Genes. , 2016, Cell metabolism.

[38]  Evan Z. Macosko,et al.  Comprehensive Classification of Retinal Bipolar Neurons by Single-Cell Transcriptomics , 2016, Cell.

[39]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[40]  Navdeep Jaitly,et al.  Adversarial Autoencoders , 2015, ArXiv.

[41]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[42]  A. Regev,et al.  Spatial reconstruction of single-cell gene expression , 2015, Nature Biotechnology.

[43]  Fabian J Theis,et al.  Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells , 2015, Nature Biotechnology.

[44]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[45]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[46]  L. Hubert,et al.  Comparing partitions , 1985 .