SCRABBLE: single-cell RNA-seq imputation constrained by bulk RNA-seq data

Single-cell RNA-seq data contain a large proportion of zeros for expressed genes. Such dropout events present a fundamental challenge for various types of data analyses. Here, we describe the SCRABBLE algorithm to address this problem. SCRABBLE leverages bulk data as a constraint and reduces unwanted bias towards expressed genes during imputation. Using both simulation and several types of experimental data, we demonstrate that SCRABBLE outperforms the existing methods in recovering dropout events, capturing true distribution of gene expression across cells, and preserving gene-gene relationship and cell-cell relationship in the data.

[1]  Michael B. Elowitz,et al.  Dynamic Heterogeneity and DNA Methylation in Embryonic Stem Cells , 2014, Molecular cell.

[2]  Nancy R. Zhang,et al.  SAVER: Gene expression recovery for single-cell RNA sequencing , 2018, Nature Methods.

[3]  Lee E. Edsall,et al.  A map of the cis-regulatory sequences in the mouse genome , 2012, Nature.

[4]  Wotao Yin,et al.  On the Global and Linear Convergence of the Generalized Alternating Direction Method of Multipliers , 2016, J. Sci. Comput..

[5]  Davis J. McCarthy,et al.  Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation , 2012, Nucleic acids research.

[6]  S. Orkin,et al.  Mapping the Mouse Cell Atlas by Microwell-Seq , 2018, Cell.

[7]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[8]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[9]  Qing Nie,et al.  A multi-scale model for hair follicles reveals heterogeneous domains driving rapid spatiotemporal hair growth patterning , 2017, eLife.

[10]  Andreas Krämer,et al.  Causal analysis approaches in Ingenuity Pathway Analysis , 2013, Bioinform..

[11]  M. Kanehisa,et al.  The KEGG databases and tools facilitating omics analysis: latest developments involving human diseases and pharmaceuticals. , 2012, Methods in molecular biology.

[12]  Ralf Salomon,et al.  Evolutionary algorithms and gradient search: similarities and differences , 1998, IEEE Trans. Evol. Comput..

[13]  Henning Hermjakob,et al.  The Reactome pathway Knowledgebase , 2015, Nucleic acids research.

[14]  D. Cacchiarelli,et al.  Characterization of directed differentiation by high-throughput single-cell RNA-Seq , 2014, bioRxiv.

[15]  J. Marioni,et al.  Pooling across cells to normalize single-cell RNA sequencing data with many zero counts , 2016, Genome Biology.

[16]  R. Stewart,et al.  Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm , 2016, Genome Biology.

[17]  T. Mikkelsen,et al.  Dynamics of lineage commitment revealed by single-cell transcriptomics of differentiating embryonic stem cells , 2016, Nature Communications.

[18]  Robert S. Illingworth,et al.  Cell type-specific DNA methylation at intragenic CpG islands in the immune system. , 2011, Genome research.

[19]  I. Hellmann,et al.  Comparative Analysis of Single-Cell RNA Sequencing Methods , 2016, bioRxiv.

[20]  Il-Youp Kwak,et al.  DrImpute: imputing dropout events in single cell RNA sequencing data , 2017, BMC Bioinformatics.

[21]  Wei Vivian Li,et al.  An accurate and robust imputation method scImpute for single-cell RNA-seq data , 2018, Nature Communications.

[22]  S. Orkin,et al.  Mapping the Mouse Cell Atlas by Microwell-Seq , 2018, Cell.

[23]  E. Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[24]  Kevin R. Moon,et al.  Recovering Gene Interactions from Single-Cell Data Using Data Diffusion , 2018, Cell.

[25]  Charlotte Soneson,et al.  Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications , 2018, Genome Biology.

[26]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[27]  Xiang Zhou,et al.  VIPER: variability-preserving imputation for accurate gene expression recovery in single-cell RNA sequencing studies , 2018, Genome Biology.

[28]  A. Oshlack,et al.  Splatter: simulation of single-cell RNA sequencing data , 2017, Genome Biology.