scCDG: A Method Based on DAE and GCN for scRNA-Seq Data Analysis

Identifying cell types is one of the main goals of single-cell RNA sequencing (scRNA-seq) analysis, and clustering is a common method for this item. However, the massive amount of data and the excess noise level bring challenge for single cell clustering. To address this challenge, in this paper, we introduced a novel method named single-cell clustering based on denoising autoencoder and graph convolution network (scCDG), which consists of two core models. The first model is a denoising autoencoder (DAE) used to fit the data distribution for data denoising. The second model is a graph autoencoder using graph convolution network (GCN), which projects the data into a low-dimensional space (compressed) preserving topological structure information and feature information in scRNA-seq data simultaneously. Extensive analysis on seven real scRNA-seq datasets demonstrate that scCDG outperforms state-of-the-art methods in some research sub-fields, including single cell clustering, visualization of transcriptome landscape, and trajectory inference.

[1]  Wenfei Jin,et al.  scGAE: topology-preserving dimensionality reduction for single-cell RNA-seq data using graph autoencoder , 2021, bioRxiv.

[2]  D. Kobak,et al.  Initialization is critical for preserving global data structure in both t-SNE and UMAP , 2021, Nature Biotechnology.

[3]  C. Zheng,et al.  SUSCC: Secondary Construction of Feature Space based on UMAP for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data , 2021, Interdisciplinary Sciences: Computational Life Sciences.

[4]  Dong Xu,et al.  scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses , 2020, Nature Communications.

[5]  Cesare Alippi,et al.  Graph Neural Networks in TensorFlow and Keras with Spektral , 2020, IEEE Comput. Intell. Mag..

[6]  Hung Nguyen,et al.  Fast and precise single-cell data analysis using hierarchical autoencoder , 2019, bioRxiv.

[7]  Quan Zou,et al.  Clustering and classification methods for single-cell RNA-sequencing data , 2020, Briefings Bioinform..

[8]  Rui Kuang,et al.  Machine learning and statistical methods for clustering single-cell RNA-sequencing data , 2019, Briefings Bioinform..

[9]  Xiang Zhou,et al.  Imputing single-cell RNA-seq data by combining graph convolution and autoencoder neural networks , 2020, bioRxiv.

[10]  Hayden Kwok-Hay So,et al.  PARC: ultrafast and accurate clustering of phenotypic data of millions of single cells , 2019, bioRxiv.

[11]  Ji Wan,et al.  Clustering single-cell RNA-seq data with a model-based deep learning approach , 2019, Nature Machine Intelligence.

[12]  Fabian J Theis,et al.  Single-cell RNA-seq denoising using a deep count autoencoder , 2019, Nature Communications.

[13]  Jingshu Wang,et al.  Data denoising with transfer learning in single-cell transcriptomics , 2019, Nature Methods.

[14]  Rui Li,et al.  Imputation of single-cell gene expression with an autoencoder neural network , 2018, bioRxiv.

[15]  Guoji Guo,et al.  Comparative transcriptomic analysis of hematopoietic system between human and mouse by Microwell-seq , 2018, Cell Discovery.

[16]  Kevin R. Moon,et al.  Recovering Gene Interactions from Single-Cell Data Using Data Diffusion , 2018, Cell.

[17]  Xianwen Ren,et al.  SSCC: a novel computational framework for rapid and accurate clustering large single cell RNA-seq data , 2018, bioRxiv.

[18]  Fabian J Theis,et al.  SCANPY: large-scale single-cell gene expression data analysis , 2018, Genome Biology.

[19]  S. Dudoit,et al.  A general and flexible method for signal extraction from single-cell RNA-seq data , 2018, Nature Communications.

[20]  Soummya Kar,et al.  Topology adaptive graph convolutional networks , 2017, ArXiv.

[21]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[22]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[23]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[24]  Samuel L. Wolock,et al.  A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure. , 2016, Cell systems.

[25]  A. Regev,et al.  Spatial reconstruction of single-cell gene expression data , 2015 .

[26]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[27]  Guo-Qiang Lo,et al.  CMOS compatible horizontal nanoplasmonic slot waveguides TE-pass polarizer on silicon-on-insulator platform. , 2013, Optics express.

[28]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[29]  Nikos A. Vlassis,et al.  The global k-means clustering algorithm , 2003, Pattern Recognit..

[30]  Avinash C. Kak,et al.  PCA versus LDA , 2001, IEEE Trans. Pattern Anal. Mach. Intell..