scCDC: a computational method for gene-specific contamination detection and correction in single-cell and single-nucleus RNA-seq data

In droplet-based single-cell RNA-seq (scRNA-seq) and single-nucleus RNA-seq (snRNA-seq) assays, systematic contamination of ambient RNA molecules biases the estimation of genuine transcriptional levels. To correct the contamination, several computational methods have been developed. However, these methods do not distinguish the contamination-causing genes and thus either under- or over-corrected the contamination in our in-house snRNA-seq data of virgin and lactating mammary glands. Hence, we developed scCDC as the first method that specifically detects the contamination-causing genes and only corrects the expression counts of these genes. Benchmarked against existing methods on synthetic and real scRNA-seq and snRNA-seq datasets, scCDC achieved the best contamination correction accuracy with minimal data alteration. Moreover, scCDC applies to processed scRNA-seq and snRNA-seq data with empty droplets removed. In conclusion, scCDC is a flexible, accurate decontamination method that detects the contamination-causing genes, corrects the contamination, and avoids the over-correction of other genes.

[1]  A. Regev,et al.  A human breast atlas integrating single-cell proteomics and transcriptomics. , 2022, Developmental cell.

[2]  Jun Yu Li,et al.  Single-Cell RNA Sequencing Identifies Intra-Graft Population Heterogeneity in Acute Heart Allograft Rejection in Mouse , 2022, Frontiers in Immunology.

[3]  T. Furey,et al.  Single-Cell Analysis Reveals Unexpected Cellular Changes and Transposon Expression Signatures in the Colonic Epithelium of Treatment-Naïve Adult Crohn’s Disease Patients , 2022, Cellular and molecular gastroenterology and hepatology.

[4]  G. G. Galli,et al.  Probabilistic modeling of ambient noise in single-cell omics data , 2022 .

[5]  J. Li,et al.  Simulating Single-Cell Gene Expression Count Data with Preserved Gene Correlations by scDesign2 , 2022, J. Comput. Biol..

[6]  J. Li,et al.  Statistics or biology: the zero-inflation controversy about scRNA-seq data , 2022, bioRxiv.

[7]  Kellie E Kolb,et al.  Cellular and transcriptional diversity over the course of human lactation , 2021, bioRxiv.

[8]  G. Atwal,et al.  Single-cell RNA transcriptome landscape of hepatocytes and non-parenchymal cells in healthy and NAFLD mouse liver , 2021, iScience.

[9]  G. Coppola,et al.  IAPP-induced beta cell stress recapitulates the islet transcriptome in type 2 diabetes , 2021, Diabetologia.

[10]  J. Li,et al.  scDesign2: a transparent simulator that generates high-fidelity single-cell gene expression count data with gene correlations captured , 2021, Genome Biology.

[11]  G. Colleluori,et al.  Mammary gland adipocytes in lactation cycle, obesity and breast cancer , 2021, Reviews in Endocrine and Metabolic Disorders.

[12]  P. Tschopp,et al.  Assessing evolutionary and developmental transcriptome dynamics in homologous cell types , 2021, bioRxiv.

[13]  R. Wollman,et al.  scPNMF: sparse gene encoding of single cells to facilitate gene selection for targeted gene profiling , 2021, bioRxiv.

[14]  Michael J. Steinbaugh,et al.  Aging-Associated Alterations in Mammary Epithelia and Stroma Revealed by Single-Cell RNA Sequencing , 2020, Cell reports.

[15]  S. Mandrup,et al.  Plasticity of Epididymal Adipose Tissue in Response to Diet-Induced Obesity at Single-Nucleus Resolution. , 2020, Cell metabolism.

[16]  Lei Chen,et al.  The Comparison of Two Single-cell Sequencing Platforms: BD Rhapsody and 10x Genomics Chromium , 2020, Current genomics.

[17]  Chenwei Li,et al.  An entropy-based metric for assessing the purity of single cell populations , 2020, Nature Communications.

[18]  C. Bock,et al.  Single-cell RNA-seq with spike-in cells enables accurate quantification of cell-specific drug effects in pancreatic islets , 2020, Genome Biology.

[19]  Tim O. Nieuwenhuis,et al.  Consistent RNA sequencing contamination in GTEx and other data sets , 2020, Nature Communications.

[20]  M. Weirauch,et al.  Single-nucleus RNA-seq identifies transcriptional heterogeneity in multinucleated skeletal myofibers , 2020, Nature Communications.

[21]  J. Tsang,et al.  Normalizing and denoising protein expression data from droplet-based single cell profiling , 2020, Nature Communications.

[22]  Xuerui Yang,et al.  Single-cell transcriptome analysis reveals differential nutrient absorption functions in human intestine , 2019, The Journal of experimental medicine.

[23]  Adam L. Maclean,et al.  Defining Epidermal Basal Cell States during Skin Homeostasis and Wound Healing Using Single-Cell Transcriptomics , 2019, bioRxiv.

[24]  John C. Marioni,et al.  Unsupervised removal of systematic background noise from droplet-based single-cell experiments using CellBender , 2019, bioRxiv.

[25]  Kun Zhang,et al.  High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell , 2019, Nature Biotechnology.

[26]  Joshua D. Campbell,et al.  Decontamination of ambient RNA in single-cell RNA-seq with DecontX , 2019, Genome Biology.

[27]  A. Shilatifard,et al.  Single-Cell Transcriptomic Analysis of Human Lung Provides Insights into the Pathobiology of Pulmonary Fibrosis , 2019, American journal of respiratory and critical care medicine.

[28]  Christoph Hafemeister,et al.  Comprehensive integration of single cell data , 2018, bioRxiv.

[29]  L. Hennighausen,et al.  Progressing super-enhancer landscape during mammary differentiation controls tissue-specific gene regulation , 2018, Nucleic acids research.

[30]  H. Clevers,et al.  Notch ligand Dll1 mediates cross-talk between mammary stem cells and the macrophageal niche , 2018, Science.

[31]  Matthew D. Young,et al.  SoupX removes ambient RNA contamination from droplet-based single-cell RNA sequencing data , 2018, bioRxiv.

[32]  Wei Vivian Li,et al.  An accurate and robust imputation method scImpute for single-cell RNA-seq data , 2018, Nature Communications.

[33]  J. Marioni,et al.  Differentiation dynamics of mammary epithelial cells revealed by single-cell RNA sequencing , 2017, Nature Communications.

[34]  Aviv Regev,et al.  Massively-parallel single nucleus RNA-seq with DroNc-seq , 2017, Nature Methods.

[35]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[36]  Aaron T. L. Lun,et al.  Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R , 2017, Bioinform..

[37]  Evan Z. Macosko,et al.  Comprehensive Classification of Retinal Bipolar Neurons by Single-Cell Transcriptomics , 2016, Cell.

[38]  L. Hennighausen,et al.  Hierarchy within the mammary STAT5-driven Wap super-enhancer , 2016, Nature Genetics.

[39]  L. Hennighausen,et al.  Loss of EZH2 results in precocious mammary gland development and activation of STAT5-dependent genes , 2015, Nucleic acids research.

[40]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[41]  M. Bissell,et al.  Mammary gland development: cell fate specification, stem cells and the microenvironment , 2015, Development.

[42]  S. Brunak,et al.  SignalP 4.0: discriminating signal peptides from transmembrane regions , 2011, Nature Methods.

[43]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[44]  K. Kuno [The mammary gland]. , 1966, Kyobu geka. The Japanese journal of thoracic surgery.