Regulatory network-based imputation of dropouts in single-cell RNA sequencing data

Single-cell RNA sequencing (scRNA-seq) methods are typically unable to quantify the expression levels of all genes in a cell, creating a need for the computational prediction of missing values (‘dropout imputation’). Most existing dropout imputation methods are limited in the sense that they exclusively use the scRNA-seq dataset at hand and do not exploit external gene-gene relationship information. Here, we show that a transcriptional regulatory network learned from external, independent gene expression data improves dropout imputation. Using a variety of human scRNA-seq datasets we demonstrate that our network-based approach outperforms published state-of-the-art methods. The network-based approach performs particularly well for lowly expressed genes, including cell-type-specific transcriptional regulators. Additionally, we tested a baseline approach, where we imputed missing values using the sample-wide average expression of a gene. Unexpectedly, up to 48% of the genes were better predicted using this baseline approach, suggesting negligible cell-to-cell variation of expression levels for many genes. Our work shows that there is no single best imputation method; rather, the best method depends on gene-specific features, such as expression level and expression variation across cells. We thus implemented an R-package called ADImpute (available from https://github.com/anacarolinaleote/ADImpute) that automatically determines the best imputation method for each gene in a dataset.

[1]  Jill P. Mesirov,et al.  Addendum: The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity , 2012, Nature.

[2]  Kevin R. Moon,et al.  Recovering Gene Interactions from Single-Cell Data Using Data Diffusion , 2018, Cell.

[3]  Lihua Zhang,et al.  Comparison of computational methods for imputing single-cell RNA-sequencing data , 2017, bioRxiv.

[4]  Marmar Moussa,et al.  Locality Sensitive Imputation for Single Cell RNA-Seq Data , 2019, J. Comput. Biol..

[5]  Ion I. Mandoiu,et al.  Locality Sensitive Imputation for Single-Cell RNA-Seq Data , 2018, bioRxiv.

[6]  Wei Vivian Li,et al.  An accurate and robust imputation method scImpute for single-cell RNA-seq data , 2018, Nature Communications.

[7]  Thomas D. Wu,et al.  A comprehensive transcriptional portrait of human cancer cell lines , 2014, Nature Biotechnology.

[8]  Thomas Lengauer,et al.  Improved scoring of functional groups from gene expression data by decorrelating GO graph structure , 2006, Bioinform..

[9]  Jay W. Shin,et al.  A transient disruption of fibroblastic transcriptional regulatory network facilitates trans-differentiation , 2014, Nucleic acids research.

[10]  S. Dudoit,et al.  A general and flexible method for signal extraction from single-cell RNA-seq data , 2018, Nature Communications.

[11]  Samuel L. Wolock,et al.  A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure. , 2016, Cell systems.

[12]  Angshul Majumdar,et al.  AutoImpute: Autoencoder based imputation of single-cell RNA-seq data , 2018, Scientific Reports.

[13]  Juan M. Vaquerizas,et al.  A census of human transcription factors: function, expression and evolution , 2009, Nature Reviews Genetics.

[14]  Jingshu Wang,et al.  Gene expression recovery for single cell RNA sequencing , 2017, bioRxiv.

[15]  Nancy R. Zhang,et al.  SAVER: Gene expression recovery for single-cell RNA sequencing , 2018, Nature Methods.

[16]  P. Linsley,et al.  MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data , 2015, Genome Biology.

[17]  C. Boyd Review: Epithelial aspects of human placental trophoblast. , 2013, Placenta.

[18]  Mariella G. Filbin,et al.  Single-cell RNA-seq supports a developmental hierarchy in human oligodendroglioma , 2016, Nature.

[19]  Il-Youp Kwak,et al.  DrImpute: imputing dropout events in single cell RNA sequencing data , 2017, BMC Bioinformatics.

[20]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[21]  Adam A. Margolin,et al.  The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[22]  Lihua Zhang,et al.  Comparison of Computational Methods for Imputing Single-Cell RNA-Sequencing Data , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  John D. Storey,et al.  Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis , 2007, PLoS genetics.

[24]  R. Stewart,et al.  Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm , 2016, Genome Biology.

[25]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[26]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[27]  Penghang Yin,et al.  SCRABBLE: single-cell RNA-seq imputation constrained by bulk RNA-seq data , 2019, Genome Biology.

[28]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[29]  A. Mushegian,et al.  The Epithelium-specific ETS Protein EHF/ESE-3 Is a Context-dependent Transcriptional Repressor Downstream of MAPK Signaling Cascades* , 2001, The Journal of Biological Chemistry.

[30]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[31]  A. Beyer,et al.  Importance of rare gene copy number alterations for personalized tumor characterization and survival analysis , 2016, Genome Biology.

[32]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .