Single-cell RNA-seq denoising using a deep count autoencoder

Single-cell RNA sequencing (scRNA-seq) has enabled researchers to study gene expression at a cellular resolution. However, noise due to amplification and dropout may obstruct analyses, so scalable denoising methods for increasingly large but sparse scRNA-seq data are needed. We propose a deep count autoencoder network (DCA) to denoise scRNA-seq datasets. DCA takes the count distribution, overdispersion and sparsity of the data into account using a negative binomial noise model with or without zero-inflation, and nonlinear gene-gene dependencies are captured. Our method scales linearly with the number of cells and can, therefore, be applied to datasets of millions of cells. We demonstrate that DCA denoising improves a diverse set of typical scRNA-seq data analyses using simulated and real datasets. DCA outperforms existing methods for data imputation in quality and speed, enhancing biological discovery.Single-cell RNA sequencing is a powerful method to study gene expression, but noise in the data can obstruct analysis. Here the authors develop a denoising method based on a deep count autoencoder network that scales linearly with the number of cells, and therefore is compatible with large data sets.

[1]  A. I.,et al.  Neural Field Continuum Limits and the Structure–Function Partitioning of Cognitive–Emotional Brain Networks , 2023, Biology.

[2]  P. Chambon,et al.  Stra3/lefty, a retinoic acid-inducible novel member of the transforming growth factor-beta superfamily. , 1998, The International journal of developmental biology.

[3]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[4]  L. Zon,et al.  Hematopoiesis: An Evolving Paradigm for Stem Cell Biology , 2008, Cell.

[5]  Pei-Rong Wang,et al.  Targeting SOX17 in human embryonic stem cells creates unique strategies for isolating and analyzing developing endoderm. , 2011, Cell stem cell.

[6]  Fabian J Theis,et al.  Hierarchical Differentiation of Myeloid Progenitors Is Encoded in the Transcription Factor Network , 2011, PloS one.

[7]  Stef van Buuren,et al.  Flexible Imputation of Missing Data , 2012 .

[8]  Aleksandra A. Kolodziejczyk,et al.  Accounting for technical noise in single-cell RNA-seq experiments , 2013, Nature Methods.

[9]  P. Kharchenko,et al.  Bayesian approach to single-cell differential expression analysis , 2014, Nature Methods.

[10]  Ben Lehner,et al.  The effects of genetic variation on gene expression dynamics during development , 2013, Nature.

[11]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[12]  L. Shao,et al.  From Heuristic Optimization to Dictionary Learning: A Review and Comprehensive Comparison of Image Denoising Algorithms , 2014, IEEE Transactions on Cybernetics.

[13]  Bo Ding,et al.  Normalization and noise reduction for single cell RNA-seq experiments , 2015, Bioinform..

[14]  Chris Eliasmith,et al.  Hyperopt: a Python library for model selection and hyperparameter optimization , 2015 .

[15]  I. Amit,et al.  Transcriptional Heterogeneity and Lineage Commitment in Myeloid Progenitors , 2016, Cell.

[16]  Fabian J Theis,et al.  Decoding the Regulatory Network for Blood Development from Single-Cell Gene Expression Measurements , 2015, Nature Biotechnology.

[17]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[18]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[19]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[20]  S. Teichmann,et al.  Computational and analytical challenges in single-cell transcriptomics , 2015, Nature Reviews Genetics.

[21]  Fabian J Theis,et al.  Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells , 2015, Nature Biotechnology.

[22]  I. Amit,et al.  Transcriptional Heterogeneity and Lineage Commitment in Myeloid Progenitors , 2015, Cell.

[23]  Catalina A. Vallejos,et al.  BASiCS: Bayesian Analysis of Single-Cell Sequencing Data , 2015, PLoS Comput. Biol..

[24]  L. Hillier,et al.  The time-resolved transcriptome of C. elegans , 2016, Genome research.

[25]  Fabian J Theis,et al.  Diffusion pseudotime robustly reconstructs lineage branching , 2016, Nature Methods.

[26]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[27]  R. Stewart,et al.  Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm , 2016, Genome Biology.

[28]  Hong-Bin Shen,et al.  IPMiner: hidden ncRNA-protein interaction sequential pattern mining with stacked autoencoder for accurate computational prediction , 2016, BMC Genomics.

[29]  C. Greene,et al.  ADAGE-Based Integration of Publicly Available Pseudomonas aeruginosa Gene Expression Data with Denoising Autoencoders Illuminates Microbe-Host Interactions , 2016, mSystems.

[30]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[31]  Andrew Butler,et al.  Single-cell RNA-seq of rheumatoid arthritis synovial tissue using low-cost microfluidic instrumentation , 2018, Nature Communications.

[32]  I. Amit,et al.  A Unique Microglia Type Associated with Restricting Development of Alzheimer’s Disease , 2017, Cell.

[33]  Fabian J Theis,et al.  Single cells make big data: New challenges and opportunities in transcriptomics , 2017 .

[34]  Dongfang Wang,et al.  VASC: dimension reduction and visualization of single cell RNA sequencing data by deep variational autoencoder , 2017, bioRxiv.

[35]  A. Oshlack,et al.  Splatter: simulation of single-cell RNA sequencing data , 2017, Genome Biology.

[36]  Fabian J Theis,et al.  SCANPY: large-scale single-cell gene expression data analysis , 2018, Genome Biology.

[37]  Ambrose J. Carr,et al.  Bayesian Inference for Single-cell Clustering and Imputing , 2017 .

[38]  Kevin R. Moon,et al.  MAGIC: A diffusion-based imputation method reveals gene-gene interactions in single-cell RNA-sequencing data , 2017, bioRxiv.

[39]  S. Dudoit,et al.  A general and flexible method for signal extraction from single-cell RNA-seq data , 2018, Nature Communications.

[40]  Sandrine Dudoit,et al.  Normalizing single-cell RNA sequencing data: challenges and opportunities , 2017, Nature Methods.

[41]  H. Swerdlow,et al.  Large-scale simultaneous measurement of epitopes and transcriptomes in single cells , 2017, Nature Methods.

[42]  Xun Zhu,et al.  Granatum: a graphical single-cell RNA-Seq analysis pipeline for genomics scientists , 2017, Genome Medicine.

[43]  Anne Condon,et al.  Interpretable dimensionality reduction of single cell transcriptome data with deep generative models , 2017, Nature Communications.

[44]  Wei Vivian Li,et al.  An accurate and robust imputation method scImpute for single-cell RNA-seq data , 2018, Nature Communications.

[45]  A. van Oudenaarden,et al.  Single-Cell Sequencing of the Healthy and Diseased Heart Reveals Cytoskeleton-Associated Protein 4 as a New Modulator of Fibroblasts Activation , 2018, Circulation.

[46]  Joseph T. Roland,et al.  Unsupervised Trajectory Analysis of Single-Cell RNA-Seq and Imaging Data Reveals Alternative Tuft Cell Origins in the Gut. , 2017, Cell systems.

[47]  David van Dijk,et al.  Manifold learning-based methods for analyzing single-cell RNA-sequencing data , 2018 .

[48]  A. Oudenaarden,et al.  239Single-cell sequencing of the healthy and diseased heart reveals Ckap4 as a new modulator of fibroblasts activation , 2018 .

[49]  Nancy R. Zhang,et al.  SAVER: Gene expression recovery for single-cell RNA sequencing , 2018, Nature Methods.

[50]  Altuna Akalin,et al.  netSmooth: Network-smoothing based imputation for single cell RNA-seq , 2017, bioRxiv.

[51]  Nir Yosef,et al.  Bayesian Inference for a Generative Model of Transcriptome Profiles from Single-cell RNA Sequencing , 2018, bioRxiv.

[52]  Casey S. Greene,et al.  Extracting a Biologically Relevant Latent Space from Cancer Transcriptomes with Variational Autoencoders , 2017, bioRxiv.

[53]  R. Irizarry,et al.  Missing data and technical variability in single‐cell RNA‐sequencing experiments , 2018, Biostatistics.

[54]  Wenan Chen,et al.  UMI-count modeling and differential expression analysis for single-cell RNA sequencing , 2018, Genome Biology.