A comparison of automatic cell identification methods for single-cell RNA sequencing data

Background Single cell transcriptomics are rapidly advancing our understanding of the cellular composition of complex tissues and organisms. A major limitation in most analysis pipelines is the reliance on manual annotations to determine cell identities, which are time-consuming and irreproducible. The exponential growth in the number of cells and samples has prompted the adaptation and development of supervised classification methods for automatic cell identification. Results Here, we benchmarked 20 classification methods that automatically assign cell identities including single cell-specific and general-purpose classifiers. The methods were evaluated using eight publicly available single cell RNA-sequencing datasets of different sizes, technologies, species, and complexity. The performance of the methods was evaluated based on their accuracy, percentage of unclassified cells, and computation time. We further evaluated their sensitivity to the input features, their performance across different annotation levels and datasets. We found that most classifiers performed well on a variety of datasets with decreased accuracy for complex datasets with overlapping classes or deep annotations. The general-purpose SVM classifier has overall the best performance across the different experiments. Conclusions We present a comprehensive evaluation of automatic cell identification methods for single cell RNA-sequencing data. All the code used for the evaluation is available on GitHub (https://github.com/tabdelaal/scRNAseq_Benchmark). Additionally, we provide a Snakemake workflow to facilitate the benchmarking and to support extension of new methods and new datasets (https://github.com/tabdelaal/scRNAseq_Benchmark/tree/snakemake_and_docker).

[1]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[2]  Omri Wurtzel,et al.  Cell type transcriptome atlas for the planarian Schmidtea mediterranea , 2018, Science.

[3]  P. Linsley,et al.  MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data , 2015, Genome Biology.

[4]  Christof Koch,et al.  Conserved cell types with divergent features between human and mouse cortex , 2018, bioRxiv.

[5]  Principal Investigators,et al.  Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris , 2018 .

[6]  Florian Wagner,et al.  Moana: A robust and scalable cell type classification framework for single-cell RNA-Seq data , 2018, bioRxiv.

[7]  S. Teichmann,et al.  Exponential scaling of single-cell RNA-seq in the past decade , 2017, Nature Protocols.

[8]  Andrew C. Adey,et al.  Single-Cell Transcriptional Profiling of a Multicellular Organism , 2017 .

[9]  Orit Rozenblatt-Rosen,et al.  Systematic comparative analysis of single cell RNA-sequencing methods , 2019, bioRxiv.

[10]  Zhi Huang,et al.  LAmbDA: label ambiguous domain adaptation dataset integration reduces batch effects and improves subtype detection , 2019, Bioinform..

[11]  S. Orkin,et al.  Mapping the Mouse Cell Atlas by Microwell-Seq , 2018, Cell.

[12]  Cole Trapnell,et al.  Supervised classification enables rapid annotation of cell atlases , 2019, Nature Methods.

[13]  Andrew J. Hill,et al.  The single cell transcriptional landscape of mammalian organogenesis , 2019, Nature.

[14]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[15]  Gary D. Bader,et al.  Evaluation of methods to assign cell type labels to cell clusters from single-cell RNAsequencing data , 2019 .

[16]  Fabian J Theis,et al.  Cell type atlas and lineage tree of a whole complex animal by single-cell transcriptomics , 2018, Science.

[17]  Atul J. Butte,et al.  Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage , 2018, Nature Immunology.

[18]  Luyi Tian,et al.  Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments , 2019, Nature Methods.

[19]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[20]  Lior Rokach,et al.  CaSTLe – Classification of single cells by transfer learning: Harnessing the power of publicly available single cell RNA sequencing experiments to annotate new experiments , 2018, PloS one.

[21]  Charlotte Soneson,et al.  A systematic performance evaluation of clustering methods for single-cell RNA-seq data , 2018, F1000Research.

[22]  Patrick Cahan,et al.  SingleCellNet: a computational tool to classify single cell RNA-Seq data across platforms and across species , 2018 .

[23]  Feng Li,et al.  CellMarker: a manually curated resource of cell markers in human and mouse , 2018, Nucleic Acids Res..

[24]  James T. Webber,et al.  Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris , 2018, Nature.

[25]  Samuel L. Wolock,et al.  A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure. , 2016, Cell systems.

[26]  M. Hemberg,et al.  scmap: projection of single-cell RNA-seq data across data sets , 2018, Nature Methods.

[27]  Oscar Franzén,et al.  PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data , 2019, Database J. Biol. Databases Curation.

[28]  Lin Wei,et al.  Cell BLAST: Searching large-scale scRNA-seq databases via unbiased cell embedding , 2019, bioRxiv.

[29]  Michael I. Jordan,et al.  Deep Generative Modeling for Single-cell Transcriptomics , 2018, Nature Methods.

[30]  Patrick Cahan,et al.  SingleCellNet: a computational tool to classify single cell RNA-Seq data across platforms and across species , 2018, bioRxiv.

[31]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[32]  Quan Nguyen,et al.  scPred: Single cell prediction using singular value decomposition and machine learning classification , 2018 .

[33]  Fabian J Theis,et al.  Current best practices in single‐cell RNA‐seq analysis: a tutorial , 2019, Molecular systems biology.

[34]  Matteo Pellegrini,et al.  Automated identification of Cell Types in Single Cell RNA Sequencing , 2019, bioRxiv.

[35]  D. M. Smith,et al.  Single-Cell Transcriptome Profiling of Human Pancreatic Islets in Health and Type 2 Diabetes , 2016, Cell metabolism.

[36]  Philip Lijnzaad,et al.  CHETAH: a selective, hierarchical cell type identification method for single-cell RNA sequencing , 2019, bioRxiv.

[37]  Philip Lijnzaad,et al.  CHETAH: a selective, hierarchical cell type identification method for single-cell RNA sequencing , 2019, Nucleic acids research.

[38]  Charlotte Soneson,et al.  Bias, robustness and scalability in single-cell differential expression analysis , 2018, Nature Methods.

[39]  Yvan Saeys,et al.  A comparison of single-cell trajectory inference methods , 2019, Nature Biotechnology.

[40]  Laleh Haghverdi,et al.  Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors , 2018, Nature Biotechnology.

[41]  Jiayin Wang,et al.  Polled Digital Cell Sorter (p-DCS): Automatic identification of hematological cell types from single cell RNA-sequencing clusters , 2019 .

[42]  Sven Rahmann,et al.  Genome analysis , 2022 .

[43]  Cole Trapnell,et al.  Supervised classification enables rapid annotation of cell atlases , 2019, Nature Methods.

[44]  Sara Ballouz,et al.  Characterizing the replicability of cell types defined by single cell RNA-sequencing data using MetaNeighbor , 2018, Nature Communications.

[45]  Mauro J. Muraro,et al.  A Single-Cell Transcriptome Atlas of the Human Pancreas , 2016, Cell systems.

[46]  Fabian J. Theis,et al.  Deep learning does not outperform classical machine learning for cell-type annotation , 2019, bioRxiv.

[47]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[48]  Luke Zappia,et al.  Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database , 2017, bioRxiv.

[49]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[50]  Vincent J. Henry,et al.  OMICtools: an informative directory for multi-omic data analysis , 2014, Database J. Biol. Databases Curation.

[51]  Allan R. Jones,et al.  Shared and distinct transcriptomic cell types across neocortical areas , 2018, Nature.

[52]  Martin Hemberg,et al.  M3Drop: dropout-based feature selection for scRNASeq , 2018, Bioinform..

[53]  Somasekar Seshagiri,et al.  SCINA: Semi-Supervised Analysis of Single Cells in silico , 2019, bioRxiv.

[54]  Jiawei Han,et al.  Training Linear Discriminant Analysis in Linear Time , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[55]  Gary D Bader,et al.  Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data , 2019, bioRxiv.

[56]  A. Murphy,et al.  RNA Sequencing of Single Human Islet Cells Reveals Type 2 Diabetes Genes. , 2016, Cell metabolism.