I-Impute: a self-consistent method to impute single cell RNA sequencing data

Single-cell RNA-sequencing (scRNA-seq) is essential for the study of cell-specific transcriptome landscapes. The scRNA-seq techniques capture merely a small fraction of the gene due to “dropout” events. When analyzing with scRNA-seq data, the dropout events receive intensive attentions. Imputation tools are proposed to estimate the values of the dropout events and de-noise the data. To evaluate the imputation tools, researchers have developed different clustering criteria by incorporating the ground-truth cell subgroup labels. There lack measurements without cell subgroup knowledge. A reliable imputation tool should follow the “self-consistency” principle; that is, the tool reports the results only if it finds no further errors or dropouts from the data. Here, we propose “self-consistency” as an explicit evaluation criterion; also, we propose I-Impute, a “self-consistent” method, to impute scRNA-seq data. I-Impute lever-ages continuous similarities and dropout probabilities and refines the data iteratively to make the final output self-consistent. On the in silico data sets, I-Impute exhibited the highest Pearson correlations for different dropout rates consistently compared with the state-of-art methods SAVER and scImpute. On the datasets of 90.87%, 70.98% and 56.65% zero rates, I-Impute exhibited the correlations as 0.78, 0.90, and 0.94, respectively, between ground truth entries and predicted values, while SAVER exhibited the correlations as 0.58, 0.79 and 0.88, respectively and scImpute exhibited correlations as 0.65, 0.86, and 0.93, respectively. Furthermore, we collected three wetlab datasets, mouse bladder cells dataset, embryonic stem cells dataset, and aortic leukocyte cells dataset, to evaluate the tools. I-Impute exhibited feasible cell subpopulation discovery efficacy on all the three datasets. It achieves the highest clustering accuracy compared with SAVER and scImpute; that is, I-Impute displayed the adjusted Rand indices of the three datasets as 0.61, 0.7, 0.52, which improved the indices of SAVER by 0.01 to 0.17, and improved the indices of scImpute by 0.19 to 0.4. Also, I-impute promoted normalized mutual information of the three datasets by 0.01 to 0.09 comparing with SAVER, and by 0.15 to 0.34 comparing with scImpute. I-Impute exhibits robust imputation ability and follows the “self-consistency” principle. It offers perspicacity to uncover the underlying cell subtypes in real scRNA-Seq data. Source code of I-Impute can be accessed at https://github.com/xikanfeng2/I-Impute.

[1]  Wei Vivian Li,et al.  An accurate and robust imputation method scImpute for single-cell RNA-seq data , 2018, Nature Communications.

[2]  Boxi Kang,et al.  Global characterization of T cells in non-small-cell lung cancer by single-cell sequencing , 2018, Nature Medicine.

[3]  Rui Jiang,et al.  Reconstructing cell cycle pseudo time-series via single-cell transcriptome data , 2017, Nature Communications.

[4]  Victor X. Jin,et al.  Single-Cell RNA-seq Reveals a Subpopulation of Prostate Cancer Cells with Enhanced Cell-Cycle-Related Transcription and Attenuated Androgen Response. , 2018, Cancer research.

[5]  A. Saliba,et al.  Single-cell RNA-seq: advances and future challenges , 2014, Nucleic acids research.

[6]  Catalina A. Vallejos,et al.  BASiCS: Bayesian Analysis of Single-Cell Sequencing Data , 2015, PLoS Comput. Biol..

[7]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[8]  Franziska Michor,et al.  Unravelling subclonal heterogeneity and aggressive disease states in TNBC through single-cell RNA-seq , 2018, Nature Communications.

[9]  Nancy R. Zhang,et al.  SAVER: Gene expression recovery for single-cell RNA sequencing , 2018, Nature Methods.

[10]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[11]  P. Kharchenko,et al.  Bayesian approach to single-cell differential expression analysis , 2014, Nature Methods.

[12]  Oscar Franzén,et al.  PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data , 2019, Database J. Biol. Databases Curation.

[13]  Cole Trapnell,et al.  Single-cell transcriptome sequencing: recent advances and remaining challenges , 2016, F1000Research.

[14]  Jan Hoinka,et al.  Subpopulation Detection and Their Comparative Analysis across Single-Cell Experiments with scPopCorn. , 2019, Cell systems.

[15]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[16]  Chen Xu,et al.  Identification of cell types from single-cell transcriptomes using a novel clustering method , 2015, Bioinform..

[17]  A. Regev,et al.  Spatial reconstruction of single-cell gene expression data , 2015 .

[18]  Åsa K. Björklund,et al.  Spatially and functionally distinct subclasses of breast cancer-associated fibroblasts revealed by single cell RNA sequencing , 2018, Nature Communications.

[19]  Cole Trapnell,et al.  The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells , 2014, Nature Biotechnology.

[20]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[21]  Fabian J Theis,et al.  Single-cell RNA-seq denoising using a deep count autoencoder , 2019, Nature Communications.

[22]  A. Oshlack,et al.  Splatter: simulation of single-cell RNA sequencing data , 2017, Genome Biology.

[23]  I. Amit,et al.  PD-1 immune checkpoint blockade reduces pathology and improves memory in mouse models of Alzheimer's disease , 2016, Nature Medicine.

[24]  D. M. Smith,et al.  Single-Cell Transcriptome Profiling of Human Pancreatic Islets in Health and Type 2 Diabetes , 2016, Cell metabolism.

[25]  Aleksandra A. Kolodziejczyk,et al.  The technology and biology of single-cell RNA sequencing. , 2015, Molecular cell.

[26]  Qionghai Dai,et al.  Scalable analysis of cell-type composition from single-cell transcriptomics using deep recurrent learning , 2019, Nature Methods.

[27]  J. George,et al.  Single-cell transcriptomes identify human islet cell signatures and reveal cell-type–specific expression changes in type 2 diabetes , 2017, Genome research.

[28]  Johan Hartman,et al.  Chemoresistance Evolution in Triple-Negative Breast Cancer Delineated by Single-Cell Sequencing , 2018, Cell.

[29]  Joshua W. K. Ho,et al.  CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data , 2016, Genome Biology.

[30]  Michael I. Jordan,et al.  Deep Generative Modeling for Single-cell Transcriptomics , 2018, Nature Methods.

[31]  Jeong Eon Lee,et al.  Single-cell RNA-seq enables comprehensive tumour and immune cell profiling in primary breast cancer , 2017, Nature Communications.

[32]  Andrew McDavid,et al.  Data exploration, quality control and testing in single-cell qPCR-based gene expression experiments , 2012, Bioinform..

[33]  E. Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.