CB2 distinguishes cells from background barcodes in 10x Genomics data

An important challenge in pre-processing data from the 10x Genomics Chromium platform is distinguishing barcodes associated with real cells from those binding background reads. Existing methods test barcodes individually, and consequently do not leverage the strong cell-to-cell correlation present in most datasets. To improve the power to identify real cells and rare subpopulations, we introduce CB2, a cluster-based approach for distinguishing real cells from background barcodes.

[1]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[2]  John C Marioni,et al.  Detection and removal of barcode swapping in single-cell RNA-seq data , 2017, Nature Communications.

[3]  Christoph Hafemeister,et al.  Comprehensive integration of single cell data , 2018, bioRxiv.

[4]  Manolis Kellis,et al.  Single-cell transcriptomic analysis of Alzheimer’s disease , 2019, Nature.

[5]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[6]  Aaron T. L. Lun,et al.  Distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data , 2018 .

[7]  Beau Dabbs,et al.  Summary and discussion of : “ Controlling the False Discovery Rate : A Practical and Powerful Approach to Multiple Testing , 2014 .

[8]  Pedro M. Valero-Mora,et al.  ggplot2: Elegant Graphics for Data Analysis , 2010 .

[9]  William A. Gale,et al.  Good-Turing Frequency Estimation Without Tears , 1995, J. Quant. Linguistics.

[10]  Kerstin B. Meyer,et al.  Single-cell reconstruction of the early maternal–fetal interface in humans , 2018, Nature.

[11]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[12]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[13]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[14]  J. Kamholz,et al.  GFAP‐positive and myelin marker‐positive glia in normal and pathologic environments , 2000, Journal of neuroscience research.

[15]  Gioele La Manno,et al.  Quantitative single-cell RNA-seq with unique molecular identifiers , 2013, Nature Methods.

[16]  Davis J. McCarthy,et al.  A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor , 2016, F1000Research.

[17]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[18]  Samantha Riesenfeld,et al.  EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data , 2019, Genome Biology.

[19]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[20]  Raphael Gottardo,et al.  Orchestrating high-throughput genomic analysis with Bioconductor , 2015, Nature Methods.