Sampling artifacts in single-cell genomics cohort studies

Robust protocols and automation now enable large-scale single-cell RNA and ATAC sequencing experiments and their application on biobank and clinical cohorts. However, technical biases introduced during sample acquisition can hinder solid, reproducible results and a systematic benchmarking is required before entering large-scale data production. Here, we report the existence and extent of gene expression and chromatin accessibility artifacts introduced during sampling and identify experimental and computational solutions for their prevention.

[1]  C. Smales,et al.  Control and regulation of the cellular responses to cold shock: the responses in yeast and mammalian systems. , 2006, The Biochemical journal.

[2]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[3]  Sarah A Teichmann,et al.  Computational assignment of cell-cycle stage from single-cell transcriptome data. , 2015, Methods.

[4]  C. Bührer,et al.  Cold-inducible proteins CIRP and RBM3, a unique couple with activities far beyond the cold , 2016, Cellular and Molecular Life Sciences.

[5]  I. Amit,et al.  Single-cell transcriptome conservation in cryopreserved cells and tissues , 2016, Genome Biology.

[6]  Åsa K. Björklund,et al.  Smart-seq2 for sensitive full-length transcriptome profiling in single cells , 2013, Nature Methods.

[7]  J. Marioni,et al.  Pooling across cells to normalize single-cell RNA sequencing data with many zero counts , 2016, Genome Biology.

[8]  A. Regev,et al.  Spatial reconstruction of single-cell gene expression data , 2015 .

[9]  S. Purcell,et al.  Pleiotropy in complex traits: challenges and strategies , 2013, Nature Reviews Genetics.

[10]  Christopher S. McGinnis,et al.  DoubletFinder: Doublet detection in single-cell RNA sequencing data using artificial nearest neighbors , 2018, bioRxiv.

[11]  Aaron T. L. Lun,et al.  Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R , 2017, Bioinform..

[12]  Sarah A Teichmann,et al.  A test metric for assessing single-cell RNA-seq batch correction , 2018, Nature Methods.

[13]  Robert Gentleman,et al.  Using GOstats to test gene lists for GO term association , 2007, Bioinform..

[14]  Peter Willett,et al.  What is a tutorial , 2013 .

[15]  Martin Vingron,et al.  An improved compound Poisson model for the number of motif hits in DNA sequences , 2017, Bioinform..

[16]  K. Polanski,et al.  Lung, spleen and oesophagus tissue remains stable for scRNAseq in cold preservation , 2019, bioRxiv.

[17]  S. Julious,et al.  Confounding and Simpson's paradox , 1994, BMJ.

[18]  A. van Oudenaarden,et al.  Single-cell sequencing reveals dissociation-induced gene expression in tissue subpopulations , 2017, Nature Methods.

[19]  Charles H. Yoon,et al.  Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq , 2016, Science.

[20]  Paul Elliott,et al.  The UK Biobank sample handling and storage validation studies. , 2008, International journal of epidemiology.

[21]  M. Schober,et al.  Challenges and Strategies , 2016 .

[22]  Fabian J Theis,et al.  Current best practices in single‐cell RNA‐seq analysis: a tutorial , 2019, Molecular systems biology.

[23]  M. Bushell,et al.  Translational regulation of gene expression during conditions of cell stress. , 2010, Molecular cell.

[24]  E. Campo,et al.  Chronic lymphocytic leukemia and mantle cell lymphoma: crossroads of genetic and microenvironment interactions. , 2018, Blood.

[25]  P. Elliott,et al.  The UK Biobank sample handling and storage protocol for the collection, processing and archiving of human blood and urine. , 2008, International journal of epidemiology.

[26]  Bertrand Z. Yeung,et al.  Cell Hashing with barcoded antibodies enables multiplexing and doublet detection for single cell genomics , 2018, Genome Biology.

[27]  Charlotte Soneson,et al.  Bias, robustness and scalability in single-cell differential expression analysis , 2018, Nature Methods.

[28]  Beate Vieth,et al.  A systematic evaluation of single cell RNA-seq analysis pipelines , 2019, Nature Communications.

[29]  P. Sachs,et al.  SMARCAD1 ATPase activity is required to silence endogenous retroviruses in embryonic stem cells , 2019, Nature Communications.

[30]  M. Hemberg,et al.  Challenges in unsupervised clustering of single-cell RNA-seq data , 2019, Nature Reviews Genetics.

[31]  G. Karypis,et al.  Expression levels for many genes in human peripheral blood cells are highly sensitive to ex vivo incubation , 2004, Genes and Immunity.

[32]  Christoph Hafemeister,et al.  Comprehensive integration of single cell data , 2018, bioRxiv.