Uncertainty Estimation for Single-cell Label Transfer

Single-cell gene expression matrices require a cell type label for each cell for downstream analysis. A cell type label refers to a heterogeneous group to which a cell belongs. Machine learning algorithms that aim to automate the assignment of cell type labels train on reference datasets for which cell type labels are already defined. However, these methods are prone to error due to possible preprocessing errors and the dynamic nature of cellular states. Therefore, it is essential to measure the uncertainty associated with classifications. Here, we hypothesize that conformal prediction may provide a principled approach for this. We examine inductive conformal classifiers (ICPs) on the task of single-cell label transfer. ICPs lead to well-calibrated models that quantify uncertainties well. Results are motivating, and the uncertainties are intuitive and easy to interpret. We also consider a confidence-credibility conformal predictions setup that accurately predicts single labels with the desired error level. Such a model can also reject the classification of cell types unobserved in the reference dataset. However, the presence of unknown cell types violates the underlying assumption of a conformal predictor and is highly dependent on the quality of batch correction. We envision more work in detecting unknown cell types and using conformal predictions to evaluate batch correction methods.

[1]  Pierre Machart,et al.  DiSCERN - Deep Single Cell Expression ReconstructioN for improved cell clustering and cell subtype and state detection , 2022, bioRxiv.

[2]  Fabian J Theis,et al.  Mapping single-cell data to reference atlases by transfer learning , 2021, Nature Biotechnology.

[3]  Ola Spjuth,et al.  Predicting with confidence: Using conformal prediction in drug discovery. , 2020, Journal of pharmaceutical sciences.

[4]  Alexey M. Kozlov,et al.  Eleven grand challenges in single-cell data science , 2020, Genome Biology.

[5]  Kok Siong Ang,et al.  A benchmark of batch-effect correction methods for single-cell RNA sequencing data , 2020, Genome Biology.

[6]  Hanlee P. Ji,et al.  scPred: accurate supervised method for cell-type classification from single-cell RNA-seq data , 2019, Genome Biology.

[7]  Mohammad Lotfollahi,et al.  scGen predicts single-cell perturbation responses , 2019, Nature Methods.

[8]  Bonnie Berger,et al.  Efficient integration of heterogeneous single-cell transcriptomes using Scanorama , 2019, Nature Biotechnology.

[9]  Feng Li,et al.  CellMarker: a manually curated resource of cell markers in human and mouse , 2018, Nucleic Acids Res..

[10]  Fan Zhang,et al.  Fast, sensitive, and accurate integration of single cell data with Harmony , 2018, bioRxiv.

[11]  Henrik Boström,et al.  Classification with Reject Option Using Conformal Prediction , 2018, PAKDD.

[12]  Fabian J Theis,et al.  SCANPY: large-scale single-cell gene expression data analysis , 2018, Genome Biology.

[13]  J. George,et al.  Single-cell transcriptomes identify human islet cell signatures and reveal cell-type–specific expression changes in type 2 diabetes , 2017, Genome research.

[14]  Samuel L. Wolock,et al.  A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure. , 2016, Cell systems.

[15]  Mauro J. Muraro,et al.  A Single-Cell Transcriptome Atlas of the Human Pancreas , 2016, Cell systems.

[16]  D. M. Smith,et al.  Single-Cell Transcriptome Profiling of Human Pancreatic Islets in Health and Type 2 Diabetes , 2016, Cell metabolism.

[17]  Mauro J. Muraro,et al.  De Novo Prediction of Stem Cell Identity using Single-Cell Transcriptome Data , 2016, Cell stem cell.

[18]  Scott Boyer,et al.  The application of conformal prediction to the drug discovery process , 2013, Annals of Mathematics and Artificial Intelligence.

[19]  Harris Papadopoulos,et al.  Inductive Conformal Prediction: Theory and Application to Neural Networks , 2008 .

[20]  Vladimir Vovk,et al.  A tutorial on conformal prediction , 2007, J. Mach. Learn. Res..

[21]  Philipp Koehn,et al.  Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) , 2007 .

[22]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[23]  Vladimir Vovk,et al.  Mondrian Confidence Machine , 2003 .