Simulation, power evaluation and sample size recommendation for single-cell RNA-seq

MOTIVATION Determining the sample size for adequate power to detect statistical significance is a crucial step at the design stage for high-throughput experiments. Even though a number of methods and tools are available for sample size calculation for microarray and RNA-seq in the context of differential expression (DE), this topic in the field of single-cell RNA sequencing is understudied. Moreover, the unique data characteristics present in scRNA-seq such as sparsity and heterogeneity increase the challenge. RESULTS We propose POWSC, a simulation-based method, to provide power evaluation and sample size recommendation for single-cell RNA sequencing DE analysis. POWSC consists of a data simulator that creates realistic expression data, and a power assessor that provides a comprehensive evaluation and visualization of the power and sample size relationship. The data simulator in POWSC outperforms two other state-of-art simulators in capturing key characteristics of real datasets. The power assessor in POWSC provides a variety of power evaluations including stratified and marginal power analyses for differential expressions characterized by two forms (phase transition or magnitude tuning), under different comparison scenarios. In addition, POWSC offers information for optimizing the tradeoffs between sample size and sequencing depth with the same total reads. AVAILABILITY POWSC is an open-source R package available online at https://github.com/suke18/POWSC. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[2]  Lu Wen,et al.  Single-cell triple omics sequencing reveals genetic, epigenetic, and transcriptomic heterogeneity in hepatocellular carcinomas , 2016, Cell Research.

[3]  P. Linsley,et al.  MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data , 2015, Genome Biology.

[4]  Hans Clevers,et al.  Single-cell messenger RNA sequencing reveals rare intestinal cell types , 2015, Nature.

[5]  Valentine Svensson Droplet scRNA-seq is not zero-inflated , 2020, Nature Biotechnology.

[6]  Keegan D. Korthauer,et al.  A statistical approach for identifying differential distributions in single-cell RNA-seq experiments , 2016, Genome Biology.

[7]  Yan Guo,et al.  Sample size calculation for differential expression analysis of RNA-seq data under Poisson distribution , 2013, Int. J. Comput. Biol. Drug Des..

[8]  Christoph Ziegenhain,et al.  powsimR: Power analysis for bulk and single cell RNA-seq experiments , 2017, bioRxiv.

[9]  A. Oshlack,et al.  Splatter: simulation of single-cell RNA sequencing data , 2017, Genome Biology.

[10]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[11]  A. Conesa,et al.  Differential expression in RNA-seq: a matter of depth. , 2011, Genome research.

[12]  Steven N. Hart,et al.  Calculating Sample Size Estimates for RNA Sequencing Data , 2013, J. Comput. Biol..

[13]  A. Regev,et al.  Revealing the vectors of cellular identity with single-cell genomics , 2016, Nature Biotechnology.

[14]  Jesse M Zhang,et al.  Valid Post-clustering Differential Analysis for Single-Cell RNA-Seq. , 2019, Cell systems.

[15]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[16]  Xiangqin Cui,et al.  Design and validation issues in RNA-seq experiments , 2011, Briefings Bioinform..

[17]  Rona S. Gertner,et al.  Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells , 2013, Nature.

[18]  I. Hellmann,et al.  Comparative Analysis of Single-Cell RNA Sequencing Methods , 2016, bioRxiv.

[19]  Rob Patro,et al.  Minnow: a principled framework for rapid simulation of dscRNA-seq data at the read level , 2019, Bioinform..

[20]  A. Regev,et al.  Spatial reconstruction of single-cell gene expression , 2015, Nature Biotechnology.

[21]  R. Simon,et al.  Sample size determination in microarray experiments for class comparison and prognostic classification. , 2005, Biostatistics.

[22]  Yi Zhang,et al.  Two-phase differential expression analysis for single cell RNA-seq , 2018, Bioinform..

[23]  Hao Wu,et al.  PROPER: comprehensive power evaluation for differential expression using RNA-seq , 2015, Bioinform..

[24]  P. Kharchenko,et al.  Bayesian approach to single-cell differential expression analysis , 2014, Nature Methods.

[25]  Karlynn E. Neu,et al.  Single-Cell Genomics: Approaches and Utility in Immunology. , 2017, Trends in immunology.

[26]  R. Stewart,et al.  Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm , 2016, Genome Biology.

[27]  Roger E Bumgarner,et al.  Sample size for detecting differentially expressed genes in microarray experiments , 2004, BMC Genomics.

[28]  A. Oudenaarden,et al.  Validation of noise models for single-cell transcriptomics , 2014, Nature Methods.

[29]  James Hicks,et al.  Single-cell RNA-seq analysis identifies markers of resistance to targeted BRAF inhibitors in melanoma cell populations , 2018, Genome research.

[30]  Charlotte Soneson,et al.  Bias, robustness and scalability in single-cell differential expression analysis , 2018, Nature Methods.

[31]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[32]  Wei Vivian Li,et al.  A statistical simulator scDesign for rational scRNA-seq experimental design , 2019, Bioinform..

[33]  S. Linnarsson,et al.  Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. , 2011, Genome research.

[34]  James J. Chen,et al.  Power and sample size estimation in microarray studies , 2010, BMC Bioinformatics.

[35]  L. Pachter,et al.  A discriminative learning approach to differential expression analysis for single-cell RNA-seq , 2019, Nature Methods.