Design and power analysis for multi-sample single cell genomics experiments

Background The identification of genes associated with specific experimental conditions, genotypes or phenotypes through differential expression analysis has long been the cornerstone of transcriptomic analysis. Single cell RNA-seq is revolutionizing transcriptomics and is enabling interindividual differential gene expression analysis and identification of genetic variants associated with gene expression, so called expression quantitative trait loci at cell-type resolution. Current methods for power analysis and guidance of experimental design either do not account for the specific characteristics of single cell data or are not suitable to model interindividual comparisons. Results Here we present a statistical framework for experimental design and power analysis of single cell differential gene expression between groups of individuals and expression quantitative trait locus analysis. The model relates sample size, number of cells per individual and sequencing depth to the power of detecting differentially expressed genes within individual cell types. Power analysis is based on data driven priors from literature or pilot experiments across a wide range of application scenarios and single cell RNA-seq platforms. Using these priors we show that, for a fixed budget, the number of cells per individual is the major determinant of power. Conclusion Our model is general and allows for systematic comparison of alternative experimental designs and can thus be used to guide experimental design to optimize power. For a wide range of applications, shallow sequencing of high numbers of cells per individual leads to higher overall power than deep sequencing of fewer cells. The model is implemented as an R package scPower.

[1]  Sebastian Bauer,et al.  The allele distribution in next-generation sequencing data sets is accurately described as the result of a stochastic branching process , 2011, Nucleic acids research.

[2]  A. Regev,et al.  Revealing the vectors of cellular identity with single-cell genomics , 2016, Nature Biotechnology.

[3]  Pedro G. Ferreira,et al.  Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.

[4]  P. Kharchenko,et al.  Bayesian approach to single-cell differential expression analysis , 2014, Nature Methods.

[5]  Allon M Klein,et al.  Scrublet: Computational Identification of Cell Doublets in Single-Cell Transcriptomic Data. , 2019, Cell systems.

[6]  Samantha Riesenfeld,et al.  EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data , 2019, Genome Biology.

[7]  M. Ceccarelli,et al.  RNA-Seq Signatures Normalized by mRNA Abundance Allow Absolute Deconvolution of Human Immune Cell Types , 2019, Cell reports.

[8]  Enrico Petretto,et al.  Changes in macrophage transcriptome associate with systemic sclerosis and mediate GSDMA contribution to disease risk , 2018, Annals of the rheumatic diseases.

[9]  Catalin C. Barbacioru,et al.  mRNA-Seq whole-transcriptome analysis of a single cell , 2009, Nature Methods.

[10]  Davis J. McCarthy,et al.  Single-cell RNA-sequencing of differentiating iPS cells reveals dynamic genetic effects on gene expression , 2019, bioRxiv.

[11]  Alexander Davis,et al.  SCOPIT: sample size calculations for single-cell sequencing experiments , 2019, BMC Bioinformatics.

[12]  Holger Heyn,et al.  Tutorial: guidelines for the experimental design of single-cell RNA sequencing studies , 2018, Nature Protocols.

[13]  Kevin R. Moon,et al.  Recovering Gene Interactions from Single-Cell Data Using Data Diffusion , 2018, Cell.

[14]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[15]  Gabor T. Marth,et al.  Scotty: a web tool for designing RNA-Seq experiments to measure differential gene expression , 2013, Bioinform..

[16]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[17]  Yang I Li,et al.  Discovery and characterization of variance QTLs in human induced pluripotent stem cells , 2018, bioRxiv.

[18]  Mark A. van de Wiel,et al.  General power and sample size calculations for high-dimensional genomic data , 2013, Statistical applications in genetics and molecular biology.

[19]  Aviezer Lifshitz,et al.  MetaCell: analysis of single cell RNA-seq data using k-NN graph partitions , 2018, bioRxiv.

[20]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[21]  Nicola J. Rinaldi,et al.  Genetic effects on gene expression across human tissues , 2017, Nature.

[22]  Fabian J Theis,et al.  SCANPY: large-scale single-cell gene expression data analysis , 2018, Genome Biology.

[23]  Michael I. Jordan,et al.  Deep Generative Modeling for Single-cell Transcriptomics , 2018, Nature Methods.

[24]  M. G. van der Wijst,et al.  Single-cell RNA sequencing identifies cell type-specific cis-eQTLs and co-expression QTLs , 2018, Nature Genetics.

[25]  Chun Jimmie Ye,et al.  Multiplexed droplet single-cell RNA-sequencing using natural genetic variation , 2017, Nature Biotechnology.

[26]  Roland Eils,et al.  The Human Cell Atlas White Paper , 2018, 1810.05192.

[27]  Fabian J Theis,et al.  Single cells make big data: New challenges and opportunities in transcriptomics , 2017 .

[28]  Cheng Cheng,et al.  Sample size determination for the false discovery rate , 2005, Bioinform..

[29]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[30]  Matthias Heinig,et al.  The single-cell eQTLGen consortium , 2020, eLife.

[31]  J. Claverie Computational methods for the identification of differential and coordinated gene expression. , 1999, Human molecular genetics.

[32]  Alexey M. Kozlov,et al.  Eleven grand challenges in single-cell data science , 2020, Genome Biology.

[33]  Harald Binder,et al.  Feasibility of sample size calculation for RNA‐seq studies , 2017, Briefings Bioinform..

[34]  Charlotte Soneson,et al.  Bias, robustness and scalability in single-cell differential expression analysis , 2018, Nature Methods.

[35]  Jingshu Wang,et al.  Data denoising with transfer learning in single-cell transcriptomics , 2019, Nature Methods.

[36]  David P. Kreil,et al.  A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control consortium , 2014, Nature Biotechnology.

[37]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[38]  Matt Thomson,et al.  Low Dimensionality in Gene Expression Data Enables the Accurate Extraction of Transcriptional Programs from Shallow Sequencing. , 2016, Cell systems.

[39]  Yoav Gilad,et al.  DNA methylation in lung cells is associated with asthma endotypes and genetic risk. , 2016, JCI insight.

[40]  Christoph Ziegenhain,et al.  powsimR: Power analysis for bulk and single cell RNA-seq experiments , 2017, bioRxiv.

[41]  Haiyuan Zhu,et al.  Sample size calculation for comparing two negative binomial rates , 2014, Statistics in medicine.

[42]  A. Oshlack,et al.  Splatter: simulation of single-cell RNA sequencing data , 2017, Genome Biology.

[43]  Lana X Garmire,et al.  Power analysis and sample size estimation for RNA-Seq differential expression , 2014, RNA.

[44]  Renata Walewska,et al.  Chromatin accessibility maps of chronic lymphocytic leukaemia identify subtype-specific epigenome signatures and transcription regulatory networks , 2016, Nature Communications.

[45]  L. Liang,et al.  Mapping complex disease traits with global gene expression , 2009, Nature Reviews Genetics.

[46]  N. Neff,et al.  Quantitative assessment of single-cell RNA-sequencing methods , 2013, Nature Methods.

[47]  Fabian J Theis,et al.  PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells , 2019, Genome Biology.

[48]  Maria K. Jaakkola,et al.  Comparison of methods to detect differentially expressed genes between single-cell populations , 2016, Briefings Bioinform..

[49]  Boyang Li,et al.  Comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data , 2019, BMC Bioinformatics.

[50]  S. Teichmann,et al.  Computational and analytical challenges in single-cell transcriptomics , 2015, Nature Reviews Genetics.

[51]  Robert H Lyles,et al.  A practical approach to computing power for generalized linear models with nominal, count, or ordinal responses , 2007, Statistics in medicine.

[52]  Steven N. Hart,et al.  Calculating Sample Size Estimates for RNA Sequencing Data , 2013, J. Comput. Biol..

[53]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[54]  Fabian J Theis,et al.  Single-cell RNA-seq denoising using a deep count autoencoder , 2019, Nature Communications.

[55]  Howard Y. Chang,et al.  Age-Dependent Pancreatic Gene Regulation Reveals Mechanisms Governing Human β Cell Function. , 2016, Cell metabolism.

[56]  Peng Liu,et al.  Quick calculation for sample size while controlling false discovery rate with application to microarray analysis , 2007, Bioinform..

[57]  Sin-Ho Jung,et al.  Sample size for FDR-control in microarray data analysis , 2005, Bioinform..

[58]  Parveen Kumar,et al.  A computational method to aid the design and analysis of single cell RNA-seq experiments for cell type identification , 2018, BMC Bioinformatics.

[59]  I. Hellmann,et al.  Comparative Analysis of Single-Cell RNA Sequencing Methods , 2016, bioRxiv.

[60]  Hao Wu,et al.  PROPER: comprehensive power evaluation for differential expression using RNA-seq , 2015, Bioinform..

[61]  Christopher D. Brown,et al.  The GTEx Consortium atlas of genetic regulatory effects across human tissues , 2019, Science.

[62]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[63]  M. Bittner,et al.  Gene expression profiling of alveolar rhabdomyosarcoma with cDNA microarrays. , 1998, Cancer research.

[64]  Peng Liu,et al.  Sample size calculation while controlling false discovery rate for differential expression analysis with RNA-sequencing experiments , 2016, BMC Bioinformatics.

[65]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[66]  Yu Shyr,et al.  Sample size calculation based on generalized linear models for differential expression analysis in RNA-seq data , 2016, Statistical applications in genetics and molecular biology.

[67]  I. Amit,et al.  Massively Parallel Single-Cell RNA-Seq for Marker-Free Decomposition of Tissues into Cell Types , 2014, Science.

[68]  R. Gentleman,et al.  Independent filtering increases detection power for high-throughput experiments , 2010, Proceedings of the National Academy of Sciences.

[69]  O. J. Dunn Multiple Comparisons among Means , 1961 .

[70]  Igor Mandric,et al.  Optimized design of single-cell RNA sequencing experiments for cell-type-specific eQTL analysis , 2019, Nature Communications.

[71]  Matthew T. Maurano,et al.  Genetic Drivers of Epigenetic and Transcriptional Variation in Human Immune Cells , 2016, Cell.

[72]  P. Goodfellow,et al.  DNA microarrays in drug discovery and development , 1999, Nature Genetics.

[73]  Valentine Svensson,et al.  Power Analysis of Single Cell RNA-Sequencing Experiments , 2016, Nature Methods.

[74]  S. Quake,et al.  Single-Cell Analysis of Human Pancreas Reveals Transcriptional Signatures of Aging and Somatic Mutation Patterns , 2017, Cell.

[75]  A. Lusis,et al.  Systems genetics approaches to understand complex traits , 2013, Nature Reviews Genetics.

[76]  Carl Baker,et al.  The birth of a human-specific neural gene by incomplete duplication and gene fusion , 2017, Genome Biology.

[77]  Fabian J Theis,et al.  The Human Cell Atlas , 2017, bioRxiv.

[78]  J. Hadfield,et al.  RNA sequencing: the teenage years , 2019, Nature Reviews Genetics.

[79]  Fabian J Theis,et al.  Current best practices in single‐cell RNA‐seq analysis: a tutorial , 2019, Molecular systems biology.

[80]  P. Linsley,et al.  MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data , 2015, Genome Biology.

[81]  S. Teichmann,et al.  Exponential scaling of single-cell RNA-seq in the past decade , 2017, Nature Protocols.

[82]  Fabian J Theis,et al.  A cellular census of human lungs identifies novel cell states in health and in asthma , 2019, Nature Medicine.