DrImpute: imputing dropout events in single cell RNA sequencing data

BackgroundThe single cell RNA sequencing (scRNA-seq) technique begin a new era by allowing the observation of gene expression at the single cell level. However, there is also a large amount of technical and biological noise. Because of the low number of RNA transcriptomes and the stochastic nature of the gene expression pattern, there is a high chance of missing nonzero entries as zero, which are called dropout events.ResultsWe develop DrImpute to impute dropout events in scRNA-seq data. We show that DrImpute has significantly better performance on the separation of the dropout zeros from true zeros than existing imputation algorithms. We also demonstrate that DrImpute can significantly improve the performance of existing tools for clustering, visualization and lineage reconstruction of nine published scRNA-seq datasets.ConclusionsDrImpute can serve as a very useful addition to the currently existing statistical tools for single cell RNA-seq analysis. DrImpute is implemented in R and is available at https://github.com/gongx030/DrImpute.

[1]  Russell B. Fletcher,et al.  Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics , 2017, BMC Genomics.

[2]  Peter Goos,et al.  Sequential imputation for missing values , 2007, Comput. Biol. Chem..

[3]  L. McMillan,et al.  A Fast Approximation to Multidimensional Scaling , 2006 .

[4]  Ambrose J. Carr,et al.  Bayesian Inference for Single-cell Clustering and Imputing , 2017 .

[5]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[6]  E. Shapiro,et al.  Single-cell sequencing-based technologies will revolutionize whole-organism science , 2013, Nature Reviews Genetics.

[7]  Tero Aittokallio,et al.  Improving missing value estimation in microarray data with gene ontology , 2006, Bioinform..

[8]  S. Dudoit,et al.  A general and flexible method for signal extraction from single-cell RNA-seq data , 2018, Nature Communications.

[9]  D. Sculley,et al.  Web-scale k-means clustering , 2010, WWW '10.

[10]  S. Linnarsson,et al.  Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. , 2011, Genome research.

[11]  S. Teichmann,et al.  Computational and analytical challenges in single-cell transcriptomics , 2015, Nature Reviews Genetics.

[12]  Gene H. Golub,et al.  Missing value estimation for DNA microarray gene expression data: local least squares imputation , 2005, Bioinform..

[13]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[14]  Joshua W. K. Ho,et al.  CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data , 2016, Genome Biology.

[15]  Mauricio Barahona,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[16]  Nir Yosef,et al.  A deep generative model for single-cell RNA sequencing with application to detecting differentially expressed genes , 2017, ArXiv.

[17]  Aleksandra A. Kolodziejczyk,et al.  The technology and biology of single-cell RNA sequencing. , 2015, Molecular cell.

[18]  P. Linsley,et al.  MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data , 2015, Genome Biology.

[19]  A. Regev,et al.  Revealing the vectors of cellular identity with single-cell genomics , 2016, Nature Biotechnology.

[20]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[21]  Sandhya Prabhakaran,et al.  Dirichlet Process Mixture Model for Correcting Technical Variation in Single-Cell Gene Expression Data , 2016, ICML.

[22]  Hongkai Ji,et al.  TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis , 2016, Nucleic acids research.

[23]  R. Doerge,et al.  Statistical Design and Analysis of RNA Sequencing Data , 2010, Genetics.

[24]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[25]  Nicola K. Wilson,et al.  Resolving Early Mesoderm Diversification through Single Cell Expression Profiling , 2016, Nature.

[26]  Lorenz Wernisch,et al.  Pseudotime estimation: deconfounding single cell time series , 2015, bioRxiv.

[27]  Wuming Gong,et al.  Dpath software reveals hierarchical haemato-endothelial lineages of Etv2 progenitors based on single-cell transcriptome analysis , 2017, Nature Communications.

[28]  Jun Li,et al.  LEAP: constructing gene co‐expression networks for single‐cell RNA‐sequencing data using pseudotime ordering , 2016, Bioinform..

[29]  Ming Ouyang,et al.  Gaussian mixture clustering and imputation of microarray data , 2004, Bioinform..

[30]  Rhonda Bacher,et al.  Design and computational analysis of single-cell RNA-sequencing experiments , 2016, Genome Biology.

[31]  P. Kharchenko,et al.  Bayesian approach to single-cell differential expression analysis , 2014, Nature Methods.

[32]  Alex A. Pollen,et al.  Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex , 2014, Nature Biotechnology.

[33]  S. Linnarsson,et al.  Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing , 2014, Nature Neuroscience.

[34]  Catalin C. Barbacioru,et al.  mRNA-Seq whole-transcriptome analysis of a single cell , 2009, Nature Methods.

[35]  Fabian J Theis,et al.  Diffusion pseudotime robustly reconstructs lineage branching , 2016, Nature Methods.

[36]  Christopher Yau,et al.  pcaReduce: hierarchical clustering of single cell transcriptional profiles , 2015, BMC Bioinformatics.

[37]  Mikael Huss,et al.  Resolution of cell fate decisions revealed by single-cell gene expression analysis from zygote to blastocyst. , 2010, Developmental cell.

[38]  Kevin R. Moon,et al.  MAGIC: A diffusion-based imputation method reveals gene-gene interactions in single-cell RNA-sequencing data , 2017, bioRxiv.

[39]  Cole Trapnell,et al.  The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells , 2014, Nature Biotechnology.

[40]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[41]  N. Neff,et al.  Dissecting direct reprogramming from fibroblast to neuron using single-cell RNA-seq , 2016, Nature.

[42]  Alvaro Plaza Reyes,et al.  Single-Cell RNA-Seq Reveals Lineage and X Chromosome Dynamics in Human Preimplantation Embryos , 2016, Cell.

[43]  Kay Elder,et al.  Defining the three cell lineages of the human blastocyst by single-cell RNA-seq , 2015, Development.

[44]  Bo Wang,et al.  Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning , 2016, Nature Methods.

[45]  B. Williams,et al.  From single-cell to cell-pool transcriptomes: Stochasticity in gene expression and RNA splicing , 2014, Genome research.

[46]  R. Sandberg,et al.  Single-Cell RNA-Seq Reveals Dynamic, Random Monoallelic Gene Expression in Mammalian Cells , 2014, Science.

[47]  Wei Vivian Li,et al.  An accurate and robust imputation method scImpute for single-cell RNA-seq data , 2018, Nature Communications.

[48]  E. Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[49]  Allon M. Klein,et al.  Single-Cell Analysis of Experience-Dependent Transcriptomic States in Mouse Visual Cortex , 2017, Nature Neuroscience.