SIMPLEs: a single-cell RNA sequencing imputation strategy preserving gene modules and cell clusters variation

A main challenge in analyzing single-cell RNA sequencing (scRNASeq) data is to reduce technical variations yet retain cell heterogeneity. Due to low mRNAs content per cell and molecule losses during the experiment (called “dropout”), the gene expression matrix has substantial zero read counts. Existing imputation methods either treat each cell or each gene identically and independently, which oversimplifies the gene correlation and cell type structure. We propose a statistical model-based approach, called SIMPLEs, which iteratively identifies correlated gene modules and cell clusters and imputes dropouts customized for individual gene module and cell type. Simultaneously, it quantifies the uncertainty of imputation and cell clustering. Optionally, SIMPLEs can integrate bulk RNASeq data for estimating dropout rates. In simulations, SIMPLEs performed significantly better than prevailing scRNASeq imputation methods by various metrics. By applying SIMPLEs to several real data sets, we discovered gene modules that can further classify subtypes of cells. Our imputations successfully recovered the expression trends of marker genes in stem cell differentiation and can discover putative pathways regulating biological processes.

[1]  I. Amit,et al.  A Unique Microglia Type Associated with Restricting Development of Alzheimer’s Disease , 2017, Cell.

[2]  James T. Webber,et al.  Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris , 2018, Nature.

[3]  R. Stewart,et al.  Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm , 2016, Genome Biology.

[4]  Nancy R. Zhang,et al.  SAVER: Gene expression recovery for single-cell RNA sequencing , 2018, Nature Methods.

[5]  R. Sandberg,et al.  Single-Cell RNA-Seq Reveals Dynamic, Random Monoallelic Gene Expression in Mammalian Cells , 2014, Science.

[6]  Nancy R. Zhang,et al.  Bulk tissue cell type deconvolution with multi-subject single-cell expression reference , 2018, Nature Communications.

[7]  N. Hacohen,et al.  Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors , 2017, Science.

[8]  T. Bender,et al.  c-Myb is essential for early T cell development. , 1999, Genes & development.

[9]  Penghang Yin,et al.  SCRABBLE: single-cell RNA-seq imputation constrained by bulk RNA-seq data , 2019, Genome Biology.

[10]  Cole Trapnell,et al.  The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells , 2014, Nature Biotechnology.

[11]  R. Satija,et al.  Single-cell RNA sequencing to explore immune cell heterogeneity , 2017, Nature Reviews Immunology.

[12]  B. Williams,et al.  From single-cell to cell-pool transcriptomes: Stochasticity in gene expression and RNA splicing , 2014, Genome research.

[13]  Jens Hjerling-Leffler,et al.  Disentangling neural cell diversity using single-cell transcriptomics , 2016, Nature Neuroscience.

[14]  S. Teichmann,et al.  Computational and analytical challenges in single-cell transcriptomics , 2015, Nature Reviews Genetics.

[15]  C. Tyler-Smith,et al.  Ancient DNA and the rewriting of human history: be sparing with Occam’s razor , 2016, Genome Biology.

[16]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[17]  Jiashun Jin,et al.  Influential Feature PCA for high dimensional clustering , 2014, 1407.5241.

[18]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[19]  Martin Hemberg,et al.  M3Drop: dropout-based feature selection for scRNASeq , 2018, Bioinform..

[20]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[21]  Rona S. Gertner,et al.  Single cell RNA Seq reveals dynamic paracrine control of cellular variation , 2014, Nature.

[22]  Wei Vivian Li,et al.  An accurate and robust imputation method scImpute for single-cell RNA-seq data , 2018, Nature Communications.

[23]  Principal Investigators,et al.  Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris , 2018 .

[24]  Xuegong Zhang,et al.  DEsingle for detecting three types of differential expression in single-cell RNA-seq data , 2017, bioRxiv.

[25]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[26]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[27]  D. V. Dyk NESTING EM ALGORITHMS FOR COMPUTATIONAL EFFICIENCY , 2000 .

[28]  Guocheng Yuan,et al.  GiniClust: detecting rare cell types from single-cell gene expression data with Gini index , 2016, Genome Biology.

[29]  Kevin R. Moon,et al.  Recovering Gene Interactions from Single-Cell Data Using Data Diffusion , 2018, Cell.

[30]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[31]  A. Bhardwaj,et al.  In situ click chemistry generation of cyclooxygenase-2 inhibitors , 2017, Nature Communications.

[32]  Xiang Zhou,et al.  VIPER: variability-preserving imputation for accurate gene expression recovery in single-cell RNA sequencing studies , 2018, Genome Biology.