Evaluation of Cell Type Deconvolution R Packages on Single Cell RNA-seq Data

Annotating cell types is a critical step in single cell RNA-Seq (scRNA-Seq) data analysis. Some supervised/semi-supervised classification methods have recently emerged to enable automated cell type identification. However, comprehensive evaluations of these methods are lacking to provide practical guidelines. Moreover, it is not clear whether some classification methods originally designed for analyzing other bulk omics data are adaptable to scRNA-Seq analysis. In this study, we evaluated ten cell-type annotation methods publicly available as R packages. Eight of them are popular methods developed specifically for single cell research (Seurat, scmap, SingleR, CHETAH, SingleCellNet, scID, Garnett, SCINA). The other two methods are repurposed from deconvoluting DNA methylation data: Linear Constrained Projection (CP) and Robust Partial Correlations (RPC). We conducted systematic comparisons on a wide variety of public scRNA-seq datasets as well as simulation data. We assessed the accuracy through intra-dataset and inter-dataset predictions, the robustness over practical challenges such as gene filtering, high similarity among cell types, and increased classification labels, as well as the capabilities on rare and unknown cell-type detection. Overall, methods such as Seurat, SingleR, CP, RPC and SingleCellNet performed well, with Seurat being the best at annotating major cell types. Also, Seurat, SingleR, CP and RPC are more robust against down-sampling. However, Seurat does have a major drawback at predicting rare cell populations, and it is suboptimal at differentiating cell types that are highly similar to each other, while SingleR and RPC are much better in these aspects. All the codes and data are available at: https://github.com/qianhuiSenn/scRNA_cell_deconv_benchmark.

[1]  N. Batada,et al.  scID Uses Discriminant Analysis to Identify Transcriptionally Equivalent Cell Types across Single-Cell RNA-Seq Data with Batch Effect , 2020, iScience.

[2]  Sarah A. Teichmann,et al.  Computational approaches for interpreting scRNA‐seq data , 2017, FEBS letters.

[3]  James T. Webber,et al.  Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris , 2018, Nature.

[4]  Philip Lijnzaad,et al.  CHETAH: a selective, hierarchical cell type identification method for single-cell RNA sequencing , 2019, Nucleic acids research.

[5]  M. Hemberg,et al.  scmap: projection of single-cell RNA-seq data across data sets , 2018, Nature Methods.

[6]  Devin C. Koestler,et al.  DNA methylation arrays as surrogate measures of cell mixture distribution , 2012, BMC Bioinformatics.

[7]  M. Reinders,et al.  A comparison of automatic cell identification methods for single-cell RNA sequencing data , 2019, Genome Biology.

[8]  Wei Guo,et al.  SCINA: A Semi-Supervised Subtyping Algorithm of Single Cells and Bulk Samples , 2019, Genes.

[9]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[10]  S. Weissman,et al.  Detecting heterogeneity in single-cell RNA-Seq data by non-negative matrix factorization , 2017, PeerJ.

[11]  Lana X. Garmire,et al.  GranatumX: A community engaging and flexible software environment for single-cell analysis , 2018 .

[12]  Lana X. Garmire,et al.  DeepImpute: an accurate, fast, and scalable deep neural network method to impute single-cell RNA-seq data , 2018, Genome Biology.

[13]  Rhonda Bacher,et al.  Design and computational analysis of single-cell RNA-sequencing experiments , 2016, Genome Biology.

[14]  Xun Zhu,et al.  Granatum: a graphical single-cell RNA-Seq analysis pipeline for genomics scientists , 2017, Genome Medicine.

[15]  Somasekar Seshagiri,et al.  SCINA: Semi-Supervised Analysis of Single Cells in silico , 2019, bioRxiv.

[16]  Xun Zhu,et al.  DeepImpute: an accurate, fast, and scalable deep neural network method to impute single-cell RNA-seq data , 2019, Genome Biology.

[17]  Jiguang Wang,et al.  Deciphering Brain Complexity Using Single-cell Sequencing , 2019, Genom. Proteom. Bioinform..

[18]  Yong Wang,et al.  Integrative analysis of single-cell genomics data by coupled nonnegative matrix factorizations , 2018, Proceedings of the National Academy of Sciences.

[19]  Patrick Cahan,et al.  SingleCellNet: a computational tool to classify single cell RNA-Seq data across platforms and across species , 2018, bioRxiv.

[20]  Gerald Quon,et al.  scAlign: a tool for alignment, integration, and rare cell identification from scRNA-seq data , 2018, Genome Biology.

[21]  Andrew E. Teschendorff,et al.  A comparison of reference-based algorithms for correcting cell-type heterogeneity in Epigenome-Wide Association Studies , 2017, BMC Bioinformatics.

[22]  Cole Trapnell,et al.  Supervised classification enables rapid annotation of cell atlases , 2019, Nature Methods.

[23]  Sohan Seth,et al.  scID: Identification of transcriptionally equivalent cell populations across single cell RNA-seq data using discriminant analysis , 2018 .

[24]  Kamil Slowikowski,et al.  Fast, sensitive, and accurate integration of single cell data with Harmony , 2019, Nature Methods.

[25]  Z. Modrušan,et al.  Deconvolution of Blood Microarray Data Identifies Cellular Activation Patterns in Systemic Lupus Erythematosus , 2009, PloS one.

[26]  J. George,et al.  Single-cell transcriptomes identify human islet cell signatures and reveal cell-type–specific expression changes in type 2 diabetes , 2017, Genome research.

[27]  Evan Z. Macosko,et al.  Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity , 2019, Cell.

[28]  Principal Investigators,et al.  Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris , 2018 .

[29]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[30]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[31]  Cole Trapnell,et al.  Supervised classification enables rapid annotation of cell atlases , 2019, Nature Methods.

[32]  Andrew J. Hill,et al.  The single cell transcriptional landscape of mammalian organogenesis , 2019, Nature.

[33]  Yong Wang,et al.  DC3 is a method for deconvolution and coupled clustering from bulk and single-cell genomics data , 2019, Nature Communications.

[34]  A. Oshlack,et al.  Splatter: simulation of single-cell RNA sequencing data , 2017, Genome Biology.

[35]  Mauro J. Muraro,et al.  A Single-Cell Transcriptome Atlas of the Human Pancreas , 2016, Cell systems.

[36]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[37]  Wei Lin,et al.  Single-cell Transcriptome Study as Big Data , 2016, Genom. Proteom. Bioinform..

[38]  Luke Zappia,et al.  Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database , 2017 .

[39]  Paul J. Hoffman,et al.  Comprehensive Integration of Single-Cell Data , 2018, Cell.

[40]  Ash A. Alizadeh,et al.  Robust enumeration of cell subsets from tissue expression profiles , 2015, Nature Methods.

[41]  Xun Zhu,et al.  Using single nucleotide variations in single-cell RNA-seq to identify subpopulations and genotype-phenotype linkage , 2016, Nature Communications.

[42]  Xun Zhu,et al.  Using single-cell multiple omics approaches to resolve tumor heterogeneity , 2017, Clinical and Translational Medicine.

[43]  Fan Zhang,et al.  Fast, sensitive, and accurate integration of single cell data with Harmony , 2018, bioRxiv.

[44]  Fabian J Theis,et al.  Cell type atlas and lineage tree of a whole complex animal by single-cell transcriptomics , 2018, Science.

[45]  Atul J. Butte,et al.  Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage , 2018, Nature Immunology.

[46]  Hugues Bersini,et al.  Separation of samples into their constituents using gene expression data , 2001, ISMB.