Imputing single-cell RNA-seq data by combining graph convolution and autoencoder neural networks

Single-cell RNA sequencing technology promotes the profiling of single-cell transcriptomes at an unprecedented throughput and resolution. However, in scRNA-seq studies, only a low amount of sequenced mRNA in each cell leads to missing detection for a portion of mRNA molecules, i.e. the dropout problem. The dropout event hinders various downstream analysis, such as clustering analysis, differential expression analysis, and inference of gene-to-gene relationships. Therefore, it is necessary to develop robust and effective imputation methods for the increasing scRNA-seq data. In this study, we have developed an imputation method (GraphSCI) to impute the dropout events in scRNA-seq data based on the graph convolution networks. The method takes advantage of low-dimensional representations of similar cells and gene-gene interactions to impute the dropouts. Extensive experiments demonstrated that GraphSCI outperforms other state-of-the-art methods for imputation on both simulated and real scRNA-seq data. Meanwhile, GraphSCI is able to accurately infer gene-to-gene relationships by utilizing the imputed matrix that are concealed by dropout events in raw data.

[1]  Wei Vivian Li,et al.  An accurate and robust imputation method scImpute for single-cell RNA-seq data , 2018, Nature Communications.

[2]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[3]  Nancy R. Zhang,et al.  SAVER: Gene expression recovery for single-cell RNA sequencing , 2018, Nature Methods.

[4]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[5]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[6]  M. Cugmas,et al.  On comparing partitions , 2015 .

[7]  Deepayan Chakrabarti,et al.  Joint Inference of Multiple Label Types in Large Networks , 2014, ICML.

[8]  Aleksandra A. Kolodziejczyk,et al.  The technology and biology of single-cell RNA sequencing. , 2015, Molecular cell.

[9]  A. Saliba,et al.  Single-cell RNA-seq: advances and future challenges , 2014, Nucleic acids research.

[10]  Giovanni Iacono,et al.  Single-Cell Transcriptomics Unveils Gene Regulatory Network Plasticity , 2018 .

[11]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[12]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[13]  Ben Lehner,et al.  The effects of genetic variation on gene expression dynamics during development , 2013, Nature.

[14]  H. Heyn,et al.  Single-cell transcriptomics unveils gene regulatory network plasticity , 2018, Genome Biology.

[15]  S. Dudoit,et al.  A general and flexible method for signal extraction from single-cell RNA-seq data , 2018, Nature Communications.

[16]  S. Horvath,et al.  Genetic programs in human and mouse early embryos revealed by single-cell RNA sequencing , 2013, Nature.

[17]  Damian Szklarczyk,et al.  The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible , 2016, Nucleic Acids Res..

[18]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[19]  Kevin R. Moon,et al.  Recovering Gene Interactions from Single-Cell Data Using Data Diffusion , 2018, Cell.

[20]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[21]  Juan Carlos Fernández,et al.  Multiobjective evolutionary algorithms to identify highly autocorrelated areas: the case of spatial distribution in financially compromised farms , 2014, Ann. Oper. Res..

[22]  A. Oshlack,et al.  Splatter: simulation of single-cell RNA sequencing data , 2017, Genome Biology.

[23]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[24]  Trey Ideker,et al.  Cytoscape 2.8: new features for data integration and network visualization , 2010, Bioinform..

[25]  Xiangliang Zhang,et al.  Co-Embedding Attributed Networks , 2019, WSDM.

[26]  J. Aerts,et al.  SCENIC: Single-cell regulatory network inference and clustering , 2017, Nature Methods.

[27]  D. Steinley Journal of Classification , 2004, Vegetatio.

[28]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[29]  Ling Huang,et al.  Joint Link Prediction and Attribute Inference Using a Social-Attribute Network , 2014, TIST.

[30]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[31]  P. Kharchenko,et al.  Bayesian approach to single-cell differential expression analysis , 2014, Nature Methods.

[32]  Valentine Svensson,et al.  Power Analysis of Single Cell RNA-Sequencing Experiments , 2016, Nature Methods.

[33]  Lin Zhong,et al.  Bi-directional Joint Inference for User Links and Attributes on Large Social Graphs , 2017, WWW.

[34]  Fabian J Theis,et al.  Single-cell RNA-seq denoising using a deep count autoencoder , 2019, Nature Communications.