clustifyr: an R package for automated single-cell RNA sequencing cluster classification

Background In single-cell RNA sequencing (scRNA-seq) analysis, assignment of likely cell types remains a time-consuming, error-prone, and biased process. Current packages for identity assignment use limited types of reference data, and often have rigid data structure requirements. As such, a more flexible tool, capable of handling multiple types of reference data and data structures, would be beneficial. Findings To address difficulties in cluster identity assignment, we developed the clustifyr R package. The package leverages external datasets, including gene expression profiles from scRNA-seq, bulk RNA-seq, microarray expression data, and/or signature gene lists, to assign likely cell types. We benchmark various parameters of a correlation-based approach, and also implement a variety of gene list enrichment methods. By providing tools for exploratory data analysis, we demonstrate the feasibility of a simple and effective data-driven approach for cell type assignment in scRNA-seq cell clusters. Conclusions clustifyr is a lightweight and effective cell type assignment tool developed for compatibility with various scRNA-seq analysis workflows. clustifyr is publicly available at https://github.com/rnabioco/clustifyr

[1]  Fabian J. Theis,et al.  Deep learning does not outperform classical machine learning for cell-type annotation , 2019, bioRxiv.

[2]  S. Orkin,et al.  Mapping the Mouse Cell Atlas by Microwell-Seq , 2018, Cell.

[3]  Alexey Sergushichev,et al.  An algorithm for fast preranked gene set enrichment analysis using cumulative statistic calculation , 2016 .

[4]  Martin Hemberg,et al.  M3Drop: dropout-based feature selection for scRNASeq , 2018, Bioinform..

[5]  Charlotte Soneson,et al.  A systematic performance evaluation of clustering methods for single-cell RNA-seq data , 2018, F1000Research.

[6]  P. Reddien,et al.  Fundamentals of planarian regeneration. , 2004, Annual review of cell and developmental biology.

[7]  Jay R Hesselberth,et al.  Simultaneous measurement of biochemical phenotypes and gene expression in single cells , 2020, Nucleic acids research.

[8]  Hanlee P. Ji,et al.  scPred: accurate supervised method for cell-type classification from single-cell RNA-seq data , 2019, Genome Biology.

[9]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[10]  Atul J. Butte,et al.  Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage , 2018, Nature Immunology.

[11]  James T. Webber,et al.  Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris , 2018, Nature.

[12]  M. Hemberg,et al.  scmap: projection of single-cell RNA-seq data across data sets , 2018, Nature Methods.

[13]  M. Hemberg,et al.  Challenges in unsupervised clustering of single-cell RNA-seq data , 2019, Nature Reviews Genetics.

[14]  Charlotte Soneson,et al.  Bias, robustness and scalability in single-cell differential expression analysis , 2018, Nature Methods.

[15]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[16]  Fabian J Theis,et al.  Current best practices in single‐cell RNA‐seq analysis: a tutorial , 2019, Molecular systems biology.

[17]  Sandrine Dudoit,et al.  Normalizing single-cell RNA sequencing data: challenges and opportunities , 2017, Nature Methods.

[18]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, bioRxiv.

[19]  Marcel J. T. Reinders,et al.  A comparison of automatic cell identification methods for single-cell RNA sequencing data , 2019, Genome Biology.

[20]  Cole Trapnell,et al.  Supervised classification enables rapid annotation of cell atlases , 2019, Nature Methods.

[21]  Andrew J. Hill,et al.  The single cell transcriptional landscape of mammalian organogenesis , 2019, Nature.

[22]  Orit Rozenblatt-Rosen,et al.  Systematic comparative analysis of single cell RNA-sequencing methods , 2019, bioRxiv.

[23]  Chenwei Li,et al.  SciBet: An ultra-fast classifier for cell type identification using single cell RNA sequencing data , 2019 .

[24]  A. Regev,et al.  Single-cell reconstruction of developmental trajectories during zebrafish embryogenesis , 2018, Science.

[25]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[26]  N. Friedman,et al.  Densely Interconnected Transcriptional Circuits Control Cell States in Human Hematopoiesis , 2011, Cell.

[27]  Soneson A systematic performance evaluation of clustering , 2019 .

[28]  D. Koller,et al.  The Immunological Genome Project: networks of gene expression in immune cells , 2008, Nature Immunology.

[29]  Geng Chen,et al.  Single-Cell RNA-Seq Technologies and Related Computational Data Analysis , 2019, Front. Genet..

[30]  Matteo Pellegrini,et al.  ACTINN: automated identification of cell types in single cell RNA sequencing , 2019, Bioinform..

[31]  Davis J. McCarthy,et al.  A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor , 2016, F1000Research.

[32]  J. Hesselberth,et al.  Simultaneous measurement of biochemical phenotypes and gene expression in single cells , 2019, bioRxiv.