SymSim: simulating multi-faceted variability in single cell RNA sequencing

The abundance of new computational methods for processing and interpreting transcriptomes at a single cell level raises the need for in-silico platforms for evaluation and validation. Simulated datasets which resemble the properties of real datasets can aid in method development and prioritization as well as in questions in experimental design by providing an objective ground truth. Here, we present SymSim, a simulator software that explicitly models the processes that give rise to data observed in single cell RNA-Seq experiments. The components of the SymSim pipeline pertain to the three primary sources of variation in single cell RNA-Seq data: noise intrinsic to the process of transcription, extrinsic variation that is indicative of different cell states (both discrete and continuous), and technical variation due to low sensitivity and measurement noise and bias. Unlike other simulators, the parameters that govern the simulation process directly represent meaningful properties such as mRNA capture rate, the number of PCR cycles, sequencing depth, or the use of unique molecular identifiers. We demonstrate how SymSim can be used for benchmarking methods for clustering and differential expression and for examining the effects of various parameters on their performance. We also show how SymSim can be used to evaluate the number of cells required to detect a rare population and how this number deviates from the theoretical lower bound as the quality of the data decreases. SymSim is publicly available as an R package and allows users to simulate datasets with desired properties or matched with experimental data.

[1]  J. Peccoud,et al.  Markovian Modeling of Gene-Product Synthesis , 1995 .

[2]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[3]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[4]  Nacho Molina,et al.  Mammalian Genes Are Transcribed with Widely Different Bursting Kinetics , 2011, Science.

[5]  D. Larson What do expression dynamics tell us about the mechanism of transcription? , 2011, Current opinion in genetics & development.

[6]  T. Hashimshony,et al.  CEL-Seq: single-cell RNA-Seq by multiplexed linear amplification. , 2012, Cell reports.

[7]  J. Marioni,et al.  Inferring the kinetics of stochastic gene expression from single-cell RNA-sequencing data , 2013, Genome Biology.

[8]  A. van Oudenaarden,et al.  Using Gene Expression Noise to Understand Gene Regulation , 2012, Science.

[9]  Åsa K. Björklund,et al.  Smart-seq2 for sensitive full-length transcriptome profiling in single cells , 2013, Nature Methods.

[10]  Åsa K. Björklund,et al.  Smart-seq2 for sensitive full-length transcriptome profiling in single cells , 2013 .

[11]  Sean C. Bendall,et al.  viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia , 2013, Nature Biotechnology.

[12]  Gioele La Manno,et al.  Quantitative single-cell RNA-seq with unique molecular identifiers , 2013, Nature Methods.

[13]  I. Amit,et al.  Massively Parallel Single-Cell RNA-Seq for Marker-Free Decomposition of Tissues into Cell Types , 2014, Science.

[14]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[15]  S. Itzkovitz,et al.  Bursty gene expression in the intact mammalian liver. , 2015, Molecular cell.

[16]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[17]  E. Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[18]  Heng Xu,et al.  COMBINING PROTEIN AND mRNA QUANTIFICATION TO DECIPHER TRANSCRIPTIONAL REGULATION , 2015, Nature Methods.

[19]  Aleksandra A. Kolodziejczyk,et al.  Characterizing noise structure in single-cell RNA-seq distinguishes genuine from technical stochastic allelic expression , 2015, Nature Communications.

[20]  Rona S. Gertner,et al.  Single-Cell Genomics Unveils Critical Regulators of Th17 Cell Pathogenicity , 2015, Cell.

[21]  S. Teichmann,et al.  Computational and analytical challenges in single-cell transcriptomics , 2015, Nature Reviews Genetics.

[22]  A. Raj,et al.  Single mammalian cells compensate for differences in cellular volume and DNA copy number through independent global transcriptional mechanisms. , 2015, Molecular cell.

[23]  Catalina A. Vallejos,et al.  BASiCS: Bayesian Analysis of Single-Cell Sequencing Data , 2015, PLoS Comput. Biol..

[24]  A. Raj,et al.  Enhancer Regulation of Transcriptional Bursting Parameters Revealed by Forced Chromatin Looping. , 2016, Molecular cell.

[25]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, bioRxiv.

[26]  J. Marioni,et al.  Pooling across cells to normalize single-cell RNA sequencing data with many zero counts , 2016, Genome Biology.

[27]  Rhonda Bacher,et al.  Design and computational analysis of single-cell RNA-sequencing experiments , 2016, Genome Biology.

[28]  A. Regev,et al.  Revealing the vectors of cellular identity with single-cell genomics , 2016, Nature Biotechnology.

[29]  M. L. Simpson,et al.  Transcriptional Bursting Explains the Noise–Versus–Mean Relationship in mRNA and Protein Levels , 2016, PloS one.

[30]  Pablo R. Freire,et al.  Single-cell analysis of transcription kinetics across the cell cycle , 2015, eLife.

[31]  Jong Kyoung Kim,et al.  Corrigendum: Characterizing noise structure in single-cell RNA-seq distinguishes genuine from technical stochastic allelic expression , 2015, Nature Communications.

[32]  Keegan D. Korthauer,et al.  A statistical approach for identifying differential distributions in single-cell RNA-seq experiments , 2016, Genome Biology.

[33]  Christoph Ziegenhain,et al.  powsimR: Power analysis for bulk and single cell RNA-seq experiments , 2017, bioRxiv.

[34]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[36]  Aleksandra A. Kolodziejczyk,et al.  Flipping between Polycomb repressed and active transcriptional states introduces noise in gene expression , 2017, bioRxiv.

[37]  J. Marioni,et al.  Overcoming confounding plate effects in differential expression analyses of single-cell RNA-seq data , 2016, bioRxiv.

[38]  Nir Yosef,et al.  ImpulseDE: detection of differentially expressed genes in time series data using impulse models , 2016, Bioinform..

[39]  Hannah A. Pliner,et al.  Reversed graph embedding resolves complex single-cell trajectories , 2017, Nature Methods.

[40]  P. Klenerman,et al.  Targeted reconstruction of T cell receptor sequence from single cell RNA-seq links CDR3 length to T cell differentiation state , 2017, Nucleic acids research.

[41]  A. Oshlack,et al.  Splatter: simulation of single-cell RNA sequencing data , 2017, Genome Biology.

[42]  Bo Wang,et al.  Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning , 2016, Nature Methods.

[43]  A. Oshlack,et al.  Gene length and detection bias in single cell RNA sequencing protocols , 2017, bioRxiv.

[44]  A. Schier,et al.  A Massively Parallel Reporter Assay of 3' UTR Sequences Identifies In Vivo Rules for mRNA Degradation. , 2017, Molecular cell.

[45]  S. Dudoit,et al.  A general and flexible method for signal extraction from single-cell RNA-seq data , 2018, Nature Communications.

[46]  Andrew J. Hill,et al.  Single-cell mRNA quantification and differential analysis with Census , 2017, Nature Methods.

[47]  I. Hellmann,et al.  Comparative Analysis of Single-Cell RNA Sequencing Methods , 2016, bioRxiv.

[48]  Sandrine Dudoit,et al.  Normalizing single-cell RNA sequencing data: challenges and opportunities , 2017, Nature Methods.

[49]  Fabian J Theis,et al.  Impulse model-based differential expression analysis of time course sequencing data , 2017, bioRxiv.

[50]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[51]  Nir Yosef,et al.  Bayesian Inference for a Generative Model of Transcriptome Profiles from Single-cell RNA Sequencing , 2018, bioRxiv.

[52]  Yvan Saeys,et al.  A comparison of single-cell trajectory inference methods: towards more accurate and robust tools , 2018, bioRxiv.

[53]  Erik Sundström,et al.  RNA velocity of single cells , 2018, Nature.

[54]  Charlotte Soneson,et al.  Bias, robustness and scalability in single-cell differential expression analysis , 2018, Nature Methods.

[55]  Russell B. Fletcher,et al.  Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics , 2017, BMC Genomics.

[56]  Michael I. Jordan,et al.  Deep Generative Modeling for Single-cell Transcriptomics , 2018, Nature Methods.

[57]  Kevin R. Moon,et al.  Recovering Gene Interactions from Single-Cell Data Using Data Diffusion , 2018, Cell.

[58]  R. Sandberg,et al.  Genomic encoding of transcriptional burst kinetics , 2019, Nature.

[59]  L. Harmon Phylogenetic Comparative Methods: Learning From Trees , 2019 .