Comparison of Computational Methods for Imputing Single-Cell RNA-Sequencing Data

Single-cell RNA-sequencing (scRNA-seq) is a recent breakthrough technology, which paves the way for measuring RNA levels at single cell resolution to study precise biological functions. One of the main challenges when analyzing scRNA-seq data is the presence of zeros or dropout events, which may mislead downstream analyses. To compensate the dropout effect, several methods have been developed to impute gene expression since the first Bayesian-based method being proposed in 2016. However, these methods have shown very diverse characteristics in terms of model hypothesis and imputation performance. Thus, large-scale comparison and evaluation of these methods is urgently needed now. To this end, we compared eight imputation methods, evaluated their power in recovering original real data, and performed broad analyses to explore their effects on clustering cell types, detecting differentially expressed genes, and reconstructing lineage trajectories in the context of both simulated and real data. Simulated datasets and case studies highlight that there are no one method performs the best in all the situations. Some defects of these methods such as scalability, robustness, and unavailability in some situations need to be addressed in future studies.

[1]  Hannah A. Pliner,et al.  Reversed graph embedding resolves complex single-cell trajectories , 2017, Nature Methods.

[2]  Mohd Saberi Mohamad,et al.  A Review on Missing Value Imputation Algorithms for Microarray Gene Expression Data , 2014 .

[3]  E. Nigg,et al.  Cell Cycle Regulation of the Activity and Subcellular Localization of Plk1, a Human Protein Kinase Implicated in Mitotic Spindle Function Finally, Genetic Studies Performed with Fungi and Flies Have Identified Multiple Protein Kinases , 1995 .

[4]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[5]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[6]  P. Kharchenko,et al.  Bayesian approach to single-cell differential expression analysis , 2014, Nature Methods.

[7]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[8]  Kathryn Roeder,et al.  A United Statistical Framework for Single Cell and Bulk Sequencing Data , 2016, bioRxiv.

[9]  Wei Vivian Li,et al.  An accurate and robust imputation method scImpute for single-cell RNA-seq data , 2018, Nature Communications.

[10]  Catalin C. Barbacioru,et al.  mRNA-Seq whole-transcriptome analysis of a single cell , 2009, Nature Methods.

[11]  Xuegong Zhang,et al.  DEsingle: A new method for single-cell differentially expressed genes detection and classification , 2017 .

[12]  E. Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[13]  Gene H. Golub,et al.  Missing value estimation for DNA microarray gene expression data: local least squares imputation , 2005, Bioinform..

[14]  Michael J. T. Stubbington,et al.  Single-cell transcriptomics to explore the immune system in health and disease , 2017, Science.

[15]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[16]  Fabian J Theis,et al.  Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells , 2015, Nature Biotechnology.

[17]  Il-Youp Kwak,et al.  DrImpute: imputing dropout events in single cell RNA sequencing data , 2017, BMC Bioinformatics.

[18]  T. Lumley,et al.  PRINCIPAL COMPONENT ANALYSIS AND FACTOR ANALYSIS , 2004, Statistical Methods for Biomedical Research.

[19]  Maria Kasper,et al.  Single-Cell Transcriptomics Reveals that Differentiation and Spatial Signatures Shape Epidermal and Hair Follicle Heterogeneity , 2016, Cell systems.

[20]  Cole Trapnell,et al.  The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells , 2014, Nature Biotechnology.

[21]  P. Linsley,et al.  MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data , 2015, Genome Biology.

[22]  Xiaoming Yuan,et al.  Matrix completion via an alternating direction method , 2012 .

[23]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[24]  Kevin R. Moon,et al.  MAGIC: A diffusion-based imputation method reveals gene-gene interactions in single-cell RNA-sequencing data , 2017, bioRxiv.

[25]  O. Stegle,et al.  Single-cell epigenomics: Recording the past and predicting the future , 2017, Science.

[26]  Bo Wang,et al.  Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning , 2016, Nature Methods.

[27]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[28]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[29]  Joshua W. K. Ho,et al.  CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data , 2016, Genome Biology.

[30]  Tero Aittokallio,et al.  Dealing with missing values in large-scale studies: microarray data imputation and beyond , 2010, Briefings Bioinform..

[31]  Christoph Ziegenhain,et al.  powsimR: Power analysis for bulk and single cell RNA-seq experiments , 2017, bioRxiv.

[32]  A. Oshlack,et al.  Splatter: simulation of single-cell RNA sequencing data , 2017, Genome Biology.

[33]  Sean C. Bendall,et al.  Single-Cell Trajectory Detection Uncovers Progression and Regulatory Coordination in Human B Cell Development , 2014, Cell.