Batch effects and the effective design of single-cell gene expression studies

Single cell RNA sequencing (scRNA-seq) can be used to characterize variation in gene expression levels at high resolution. However, the sources of experimental noise in scRNA-seq are not yet well understood. We investigated the technical variation associated with sample processing using the single cell Fluidigm C1 platform. To do so, we processed three C1 replicates from three human induced pluripotent stem cell (iPSC) lines. We added unique molecular identifiers (UMIs) to all samples, to account for amplification bias. We found that the major source of variation in the gene expression data was driven by genotype, but we also observed substantial variation between the technical replicates. We observed that the conversion of reads to molecules using the UMIs was impacted by both biological and technical variation, indicating that UMI counts are not an unbiased estimator of gene expression levels. Based on our results, we suggest a framework for effective scRNA-seq studies.

[1]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[2]  Gordon K. Smyth,et al.  Use of within-array replicate spots for assessing differential expression in microarray experiments , 2005, Bioinform..

[3]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[4]  A. Zeileis,et al.  zoo: S3 Infrastructure for Regular and Irregular Time Series , 2005, math/0505527.

[5]  J. Raser,et al.  Noise in Gene Expression: Origins, Consequences, and Control , 2005, Science.

[6]  J. Derisi,et al.  Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise , 2006, Nature.

[7]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[8]  G. Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Permutation P -values Should Never Be Zero: Calculating Exact P -values When Permutations Are Randomly Drawn , 2011 .

[9]  M. Salit,et al.  Synthetic Spike-in Standards for Rna-seq Experiments Material Supplemental Open Access License Commons Creative , 2022 .

[10]  S. Linnarsson,et al.  Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. , 2011, Genome research.

[11]  James A. Casbon,et al.  A method for counting PCR template molecules with application to next-generation sequencing , 2011, Nucleic acids research.

[12]  Jennifer M. Bolin,et al.  Chemically defined conditions for human iPS cell derivation and culture , 2011, Nature Methods.

[13]  S. P. Fodor,et al.  Counting individual DNA molecules by the stochastic attachment of diverse labels , 2011, Proceedings of the National Academy of Sciences.

[14]  A. Gelman,et al.  A non-degenerate estimator for hierarchical variance parameters via penalized likelihood estimation , 2011 .

[15]  Ralf Herwig,et al.  ConsensusPathDB: toward a more complete picture of cell biology , 2010, Nucleic Acids Res..

[16]  Pawel Zajac,et al.  Highly multiplexed and strand-specific single-cell RNA 5′ end sequencing , 2012, Nature Protocols.

[17]  S. Linnarsson,et al.  Counting absolute numbers of molecules using unique molecular identifiers , 2011, Nature Methods.

[18]  Tony Z. Jia,et al.  Digital RNA sequencing minimizes sequence-dependent bias and amplification noise with optimized single-molecule barcodes , 2012, Proceedings of the National Academy of Sciences.

[19]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[20]  G. Abecasis,et al.  Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype data. , 2012, American journal of human genetics.

[21]  R. Parthasarathy Rapid, accurate particle tracking by calculation of radial symmetry centers , 2012, Nature Methods.

[22]  Jonathan K. Pritchard,et al.  Identification of Genetic Variants That Affect Histone Modifications in Human Cells , 2013, Science.

[23]  Aleksandra A. Kolodziejczyk,et al.  Accounting for technical noise in single-cell RNA-seq experiments , 2013, Nature Methods.

[24]  W. Shi,et al.  The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote , 2013, Nucleic acids research.

[25]  Sophia Rabe-Hesketh,et al.  A Nondegenerate Penalized Likelihood Estimator for Variance Parameters in Multilevel Models , 2013, Psychometrika.

[26]  Rona S. Gertner,et al.  Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells , 2013, Nature.

[27]  L. Steinmetz,et al.  Natural sequence variants of yeast environmental sensors confer cell-to-cell expression variability , 2013, Molecular systems biology.

[28]  David P. Kreil,et al.  A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control consortium , 2014, Nature Biotechnology.

[29]  Gioele La Manno,et al.  Quantitative single-cell RNA-seq with unique molecular identifiers , 2013, Nature Methods.

[30]  Åsa K. Björklund,et al.  Tn5 transposase and tagmentation procedures for massively scaled sequencing projects , 2014, Genome research.

[31]  I. Amit,et al.  Massively Parallel Single-Cell RNA-Seq for Marker-Free Decomposition of Tissues into Cell Types , 2014, Science.

[32]  I. Macaulay,et al.  Single Cell Genomics: Advances and Future Perspectives , 2014, PLoS genetics.

[33]  S. Dudoit,et al.  Normalization of RNA-seq data using factor analysis of control genes or samples , 2014, Nature Biotechnology.

[34]  A. Oudenaarden,et al.  Validation of noise models for single-cell transcriptomics , 2014, Nature Methods.

[35]  B. Tjaden,et al.  De novo assembly of bacterial transcriptomes from RNA-seq data , 2015, Genome Biology.

[36]  N. Neff,et al.  Quantitative assessment of single-cell RNA-sequencing methods , 2013, Nature Methods.

[37]  A. Saliba,et al.  Single-cell RNA-seq: advances and future challenges , 2014, Nucleic acids research.

[38]  Alex A. Pollen,et al.  Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex , 2014, Nature Biotechnology.

[39]  Wei Shi,et al.  featureCounts: an efficient general purpose program for assigning sequence reads to genomic features , 2013, Bioinform..

[40]  Bo Ding,et al.  Normalization and noise reduction for single cell RNA-seq experiments , 2015, Bioinform..

[41]  P. Linsley,et al.  MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data , 2015, Genome Biology.

[42]  Do-Hyun Nam,et al.  Single-cell mRNA sequencing identifies subclonal heterogeneity in anti-cancer drug responses of lung adenocarcinoma cells , 2015, Genome Biology.

[43]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[44]  Piero Carninci,et al.  Biased allelic expression in human primary fibroblast single cells. , 2015, American journal of human genetics.

[45]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[46]  Mingxiang Teng,et al.  On the widespread and critical impact of systematic bias and batch effects in single-cell RNA-Seq data , 2015 .

[47]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[48]  S. Teichmann,et al.  Computational and analytical challenges in single-cell transcriptomics , 2015, Nature Reviews Genetics.

[49]  Aleksandra A. Kolodziejczyk,et al.  Single Cell RNA-Sequencing of Pluripotent States Unlocks Modular Transcriptional Variation , 2015, Cell stem cell.

[50]  Sridhar Ramaswamy,et al.  RNA-Seq of single prostate CTCs implicates noncanonical Wnt signaling in antiandrogen resistance , 2015, Science.

[51]  umitools v2.1.1 , 2015 .

[52]  Catalina A. Vallejos,et al.  BASiCS: Bayesian Analysis of Single-Cell Sequencing Data , 2015, PLoS Comput. Biol..

[53]  Andreas Heger,et al.  UMI-tools: Modelling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy , 2016, bioRxiv.

[54]  Chris P. Ponting,et al.  Assessing similarity to primary tissue and cortical layer identity in induced pluripotent stem cell-derived cortical neurons through single-cell transcriptomics , 2016, Human molecular genetics.

[55]  Alice Giustacchini,et al.  Distinct myeloid progenitor differentiation pathways identified through single cell RNA sequencing , 2016, Nature Immunology.