Sample demultiplexing, multiplet detection, experiment planning and novel cell type verification in single cell sequencing

Identifying and removing multiplets from downstream analysis is essential to improve the scalability and reliability of single cell RNA sequencing (scRNA-seq). High multiplet rates create artificial cell types in the dataset. Sample barcoding, including the cell hashing technology and the MULTI-seq technology, enables analytical identification of a fraction of multiplets in a scRNA-seq dataset. We propose a Gaussian-mixture-model-based multiplet identification method, GMM-Demux. GMM-Demux accurately identifies and removes the sample-barcoding-detectable multiplets and estimates the percentage of sample-barcoding-undetectable multiplets in the remaining dataset. GMM-Demux describes the droplet formation process with an augmented binomial probabilistic model, and uses the model to authenticate cell types discovered from a scRNA-seq dataset. We conducted two cell-hashing experiments, collected a public cell-hashing dataset, and generated a simulated cellhashing dataset. We compared the classification result of GMM-Demux against a state-of-the-art heuristic-based classifier. We show that GMM-Demux is more accurate, more stable, reduces the error rate by up to 69×, and is capable of reliably recognizing 9 multiplet-induced fake cell types and 8 real cell types in a PBMC scRNA-seq dataset.

[1]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[2]  Serguei Novak,et al.  Extreme Value Methods with Applications to Finance , 2011 .

[3]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[4]  Pavithra Kumar,et al.  Understanding development and stem cells using single cell-based analyses of gene expression , 2017, Development.

[5]  Xin Jin,et al.  K-Medoids Clustering , 2010, Encyclopedia of Machine Learning.

[6]  Li Chen,et al.  A Bayesian mixture model for clustering droplet-based single-cell transcriptomic data from population studies , 2019, Nature Communications.

[7]  Imogen Moran,et al.  Single Cell RNA Sequencing of Rare Immune Cell Populations , 2018, Front. Immunol..

[8]  Jack Kuipers,et al.  Single-cell sequencing data reveal widespread recurrence and loss of mutational hits in the life histories of tumors , 2017, Genome research.

[9]  Douglas A. Reynolds,et al.  Gaussian Mixture Models , 2018, Encyclopedia of Biometrics.

[10]  Aleksandra A. Kolodziejczyk,et al.  Classification of low quality cells from single-cell RNA-seq data , 2016, Genome Biology.

[11]  Zev J. Gartner,et al.  DoubletFinder: Doublet detection in single-cell RNA sequencing data using artificial nearest neighbors , 2018, bioRxiv.

[12]  C. Ponting,et al.  Single-Cell Multiomics: Multiple Measurements from Single Cells , 2017, Trends in genetics : TIG.

[13]  Bertrand Z. Yeung,et al.  Cell Hashing with barcoded antibodies enables multiplexing and doublet detection for single cell genomics , 2018, Genome Biology.

[14]  Berthold Göttgens,et al.  Dissecting stem cell differentiation using single cell expression profiling. , 2016, Current opinion in cell biology.

[15]  Juan Carlos Fernández,et al.  Multiobjective evolutionary algorithms to identify highly autocorrelated areas: the case of spatial distribution in financially compromised farms , 2014, Ann. Oper. Res..

[16]  Michael P. H. Stumpf,et al.  Learning regulatory models for cell development from single cell transcriptomic data , 2017 .

[17]  U. Wagner,et al.  Peripheral CD4CD8 Double Positive T Cells with a Distinct Helper Cytokine Profile Are Increased in Rheumatoid Arthritis , 2014, PloS one.

[18]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[19]  J. Bloom Estimating the frequency of multiplets in single-cell RNA sequencing from cell-mixing experiments , 2018, bioRxiv.

[20]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[21]  Eli R. Zunder,et al.  Palladium-based mass tag cell barcoding with a doublet-filtering scheme and single-cell deconvolution algorithm , 2015, Nature Protocols.

[22]  Richard A. Muscat,et al.  Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding , 2018, Science.

[23]  J F Leary,et al.  Doublet discrimination in DNA cell-cycle analysis. , 2001, Cytometry.

[24]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[25]  Umut A. Gurkan,et al.  Statistical Modeling of Single Target Cell Encapsulation , 2011, PloS one.

[26]  Ruhong Zhou,et al.  A Public BCR Present in a Unique Dual-Receptor-Expressing Lymphocyte from Type 1 Diabetes Patients Encodes a Potent T Cell Autoantigen , 2019, Cell.

[27]  Fabian J Theis,et al.  Diffusion pseudotime robustly reconstructs lineage branching , 2016, Nature Methods.

[28]  N. Hacohen,et al.  Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors , 2017, Science.

[29]  Bruce J. Aronow,et al.  Single-cell analysis of mixed-lineage states leading to a binary cell fate choice , 2016, Nature.

[30]  Martin Wattenberg,et al.  How to Use t-SNE Effectively , 2016 .

[31]  Kieran R. Campbell,et al.  Order Under Uncertainty: Robust Differential Expression Analysis Using Probabilistic Models for Pseudotime Inference , 2016, bioRxiv.

[32]  Cole Trapnell,et al.  Defining cell types and states with single-cell genomics , 2015, Genome research.

[33]  H. Swerdlow,et al.  Large-scale simultaneous measurement of epitopes and transcriptomes in single cells , 2017, Nature Methods.

[34]  N. Salomonis,et al.  Cross-platform single cell analysis of kidney development shows stromal cells express Gdnf. , 2017, Developmental biology.

[35]  A. deMello,et al.  The Poisson distribution and beyond: methods for microfluidic droplet production and single cell encapsulation. , 2015, Lab on a chip.

[36]  Daphne M. Tsoucas,et al.  GiniClust2: a cluster-aware, weighted ensemble clustering method for cell-type detection , 2018, Genome Biology.

[37]  Chun Jimmie Ye,et al.  Multiplexed droplet single-cell RNA-sequencing using natural genetic variation , 2017, Nature Biotechnology.

[38]  R. Gonzalo Parra,et al.  Reconstructing complex lineage trees from scRNA-seq data using MERLoT , 2018 .

[39]  Jonathan S. Weissman,et al.  MULTI-seq: Scalable sample multiplexing for single-cell RNA sequencing using lipid-tagged indices , 2018, bioRxiv.

[40]  Jens Hjerling-Leffler,et al.  Disentangling neural cell diversity using single-cell transcriptomics , 2016, Nature Neuroscience.

[41]  R. Nussenblatt,et al.  Standardizing immunophenotyping for the Human Immunology Project , 2012, Nature Reviews Immunology.

[42]  Andrew C. Adey,et al.  Single-Cell Transcriptional Profiling of a Multicellular Organism , 2017 .

[43]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[44]  Shraddha K. Popat Review and Comparative Study of Clustering Techniques , 2014 .

[45]  Allon M Klein,et al.  Scrublet: Computational Identification of Cell Doublets in Single-Cell Transcriptomic Data. , 2019, Cell systems.

[46]  Mauro J. Muraro,et al.  A Single-Cell Transcriptome Atlas of the Human Pancreas , 2016, Cell systems.