Gene expression recovery for single cell RNA sequencing

Rapid advances in massively parallel single cell RNA sequencing (scRNA-seq) is paving the way for high-resolution single cell profiling of biological samples. In most scRNA-seq studies, only a small fraction of the transcripts present in each cell are sequenced. The efficiency, that is, the proportion of transcripts in the cell that are sequenced, can be especially low in highly parallelized experiments where the number of reads allocated for each cell is small. This leads to unreliable quantification of lowly and moderately expressed genes, resulting in extremely sparse data and hindering downstream analysis. To address this challenge, we introduce SAVER (Single-cell Analysis Via Expression Recovery), an expression recovery method for scRNA-seq that borrows information across genes and cells to impute the zeros as well as to improve the expression estimates for all genes. We show, by comparison to RNA fluorescence in situ hybridization (FISH) and by data down-sampling experiments, that SAVER reliably recovers cell-specific gene expression concentrations, cross-cell gene expression distributions, and gene-to-gene and cell-to-cell correlations. This improves the power and accuracy of any downstream analysis involving genes with low to moderate expression.

[1]  Keegan D. Korthauer,et al.  A statistical approach for identifying differential distributions in single-cell RNA-seq experiments , 2016, Genome Biology.

[2]  Allon M. Klein,et al.  Single-Cell Analysis of Experience-Dependent Transcriptomic States in Mouse Visual Cortex , 2017, Nature Neuroscience.

[3]  P. Kharchenko,et al.  Bayesian approach to single-cell differential expression analysis , 2014, Nature Methods.

[4]  Quin F. Wills,et al.  Single-cell gene expression analysis reveals genetic associations masked in whole-tissue experiments , 2013, Nature Biotechnology.

[5]  D. Tranchina,et al.  Stochastic mRNA Synthesis in Mammalian Cells , 2006, PLoS biology.

[6]  M. Newton,et al.  SCnorm: robust normalization of single-cell RNA-seq data , 2017, Nature Methods.

[7]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[8]  Joshua W. K. Ho,et al.  CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-Seq data , 2016 .

[9]  Valentine Svensson,et al.  Power Analysis of Single Cell RNA-Sequencing Experiments , 2016, Nature Methods.

[10]  Wei Vivian Li,et al.  scImpute: accurate and robust imputation for single cell RNA-seq data , 2017, bioRxiv.

[11]  S. Richardson,et al.  Beyond comparisons of means: understanding changes in gene expression at the single-cell level , 2016, Genome Biology.

[12]  Kevin R. Moon,et al.  MAGIC: A diffusion-based imputation method reveals gene-gene interactions in single-cell RNA-sequencing data , 2017, bioRxiv.

[13]  Peter Bühlmann,et al.  MissForest - non-parametric missing value imputation for mixed-type data , 2011, Bioinform..

[14]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[15]  Davis J. McCarthy,et al.  A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor , 2016, F1000Research.

[16]  Jingshu Wang,et al.  Gene expression distribution deconvolution in single-cell RNA sequencing , 2017, Proceedings of the National Academy of Sciences.

[17]  Catalina A. Vallejos,et al.  BASiCS: Bayesian Analysis of Single-Cell Sequencing Data , 2015, PLoS Comput. Biol..

[18]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[19]  Florian Wagner,et al.  K-nearest neighbor smoothing for high-throughput single-cell RNA-Seq data , 2017, bioRxiv.

[20]  J. Marioni,et al.  Pooling across cells to normalize single-cell RNA sequencing data with many zero counts , 2016, Genome Biology.

[21]  Lena Osterhagen,et al.  Multiple Imputation For Nonresponse In Surveys , 2016 .

[22]  P. Linsley,et al.  MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data , 2015, Genome Biology.

[23]  A. Oudenaarden,et al.  Validation of noise models for single-cell transcriptomics , 2014, Nature Methods.

[24]  Lars E. Borm,et al.  Molecular Diversity of Midbrain Development in Mouse, Human, and Stem Cells , 2016, Cell.

[25]  Robert Tibshirani,et al.  Spectral Regularization Algorithms for Learning Large Incomplete Matrices , 2010, J. Mach. Learn. Res..

[26]  Hong-Dong Li,et al.  Analysis of Single-Cell RNA-seq Data by Clustering Approaches , 2019, Current Bioinformatics.

[27]  E. Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[28]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[29]  Tero Aittokallio,et al.  Dealing with missing values in large-scale studies: microarray data imputation and beyond , 2010, Briefings Bioinform..

[30]  A. Regev,et al.  Spatial reconstruction of single-cell gene expression data , 2015 .

[31]  Sydney M. Shaffer,et al.  A comparison between single cell RNA sequencing and single molecule RNA FISH for rare cell analysis , 2017, bioRxiv.

[32]  Guocheng Yuan,et al.  GiniClust: detecting rare cell types from single-cell gene expression data with Gini index , 2016, Genome Biology.

[33]  A. Raj,et al.  Single mammalian cells compensate for differences in cellular volume and DNA copy number through independent global transcriptional mechanisms. , 2015, Molecular cell.

[34]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[35]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[36]  Yi Zhang,et al.  Single-Cell RNA-Seq Reveals Hypothalamic Cell Diversity. , 2017, Cell reports.

[37]  Juan Carlos Fernández,et al.  Multiobjective evolutionary algorithms to identify highly autocorrelated areas: the case of spatial distribution in financially compromised farms , 2014, Ann. Oper. Res..

[38]  Samuel L. Wolock,et al.  A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure. , 2016, Cell systems.

[39]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[40]  H. Ozcelik,et al.  Correlation matrix distance, a meaningful measure for evaluation of non-stationary MIMO channels , 2005, 2005 IEEE 61st Vehicular Technology Conference.

[41]  Sydney M. Shaffer,et al.  Rare cell variability and drug-induced reprogramming as a mode of cancer drug resistance , 2017, Nature.