scMultiSim: simulation of multi-modality single cell data guided by cell-cell interactions and gene regulatory networks

Simulated single-cell data is essential for designing and evaluating computational methods in the absence of experimental ground truth. Existing simulators typically focus on modeling one or two specific biological factors or mechanisms that affect the output data, which limits their capacity to simulate the complexity and multi-modality in real data. Here, we present scMultiSim, an in silico simulator that generates multi-modal single-cell data, including gene expression, chromatin accessibility, RNA velocity, and spatial cell locations while accounting for the relationships between modalities. scMultiSim jointly models various biological factors that affect the output data, including cell identity, within-cell gene regulatory networks (GRNs), cell-cell interactions (CCIs), and chromatin accessibility, while also incorporating technical noises. Moreover, it allows users to adjust each factor’s effect easily. We validated scMultiSim’s simulated biological effects and demonstrated its applications by benchmarking a wide range of computational tasks, including cell clustering and trajectory inference, multi-modal and multi-batch data integration, RNA velocity estimation, GRN inference and CCI inference using spatially resolved gene expression data. Compared to existing simulators, scMultiSim can benchmark a much broader range of existing computational problems and even new potential tasks.

[1]  Alireza F. Siahpirani,et al.  Inference of cell type-specific gene regulatory networks on cell lineages from single cell omic datasets , 2023, bioRxiv.

[2]  T. Voet,et al.  Methods and applications for single-cell and spatial multi-omics , 2023, Nature Reviews Genetics.

[3]  J. Li,et al.  A unified framework of realistic in silico data generation and statistical model inference for single-cell and spatial omics , 2023, bioRxiv.

[4]  Xuegong Zhang,et al.  simCAS: an embedding-based method for simulating single-cell chromatin accessibility sequencing data , 2023, bioRxiv.

[5]  D. E. Bauer,et al.  Dictys: dynamic gene regulatory network dissects developmental continuum with single-cell multi-omics , 2022, bioRxiv.

[6]  M. Plikus,et al.  Screening cell–cell communication in spatial transcriptomics via collective optimal transport , 2022, bioRxiv.

[7]  K. Qu,et al.  Benchmarking spatial and single-cell transcriptomics integration methods for transcript distribution prediction and cell type deconvolution , 2022, Nature Methods.

[8]  J. Sáez-Rodríguez,et al.  Explainable multiview framework for dissecting spatial relationships from highly multiplexed data , 2022, Genome Biology.

[9]  Huajun Chen,et al.  Knowledge-graph-based cell-cell communication inference for spatially resolved transcriptomic data with SpaTalk , 2022, Nature Communications.

[10]  Paul J. Hoffman,et al.  Dictionary learning for integrative, multimodal and scalable single-cell analysis , 2022, bioRxiv.

[11]  L. Pachter,et al.  RNA velocity unraveled , 2022, bioRxiv.

[12]  Joshua D. Welch,et al.  UINMF performs mosaic integration of single-cell multi-omic datasets using nonnegative matrix factorization , 2022, Nature communications.

[13]  R. Stewart,et al.  Network inference with Granger causality ensembles on single-cell transcriptomics. , 2022, Cell reports.

[14]  Xiuwei Zhang,et al.  scDART: integrating unmatched scRNA-seq and scATAC-seq data and learning cross-modality relationship simultaneously , 2021, Genome Biology.

[15]  Joshua D. Welch,et al.  Single-cell multi-omic velocity infers dynamic and decoupled gene regulation , 2021, bioRxiv.

[16]  E. Purdom,et al.  Cobolt: integrative analysis of multimodal single-cell sequencing data , 2021, Genome Biology.

[17]  Helena L. Crowell,et al.  Built on sand: the shaky foundations of simulating single-cell RNA sequencing data , 2021, bioRxiv.

[18]  Fabian J Theis,et al.  Over 1000 tools reveal trends in the single-cell RNA-seq analysis landscape , 2021, Genome Biology.

[19]  Y. Saeys,et al.  Spearheading future omics analyses using dyngen, a multi-modal simulator of single cells , 2021, Nature Communications.

[20]  J. Li,et al.  scDesign2: a transparent simulator that generates high-fidelity single-cell gene expression count data with gene correlations captured , 2021, Genome Biology.

[21]  J. Marioni,et al.  Computational principles and challenges in single-cell data integration , 2021, Nature Biotechnology.

[22]  Lin Gao,et al.  CytoTalk: De novo construction of signal transduction networks using single-cell transcriptomic data , 2021, Science Advances.

[23]  Xiuwei Zhang,et al.  VeloSim: Simulating single cell gene-expression and RNA velocity , 2021, bioRxiv.

[24]  Raphael Gottardo,et al.  Integrated analysis of multimodal single-cell data , 2020, Cell.

[25]  Lihua Zhang,et al.  Inference and analysis of cell-cell communication using CellChat , 2020, Nature Communications.

[26]  Guocheng Yuan,et al.  Giotto, a toolbox for integrative analysis and visualization of spatial expression data , 2020 .

[27]  Helena L. Crowell,et al.  muscat detects subpopulation-specific state transitions from multi-sample multi-condition single-cell transcriptomics data , 2020, Nature Communications.

[28]  Saurabh Sinha,et al.  A single-cell expression simulator guided by gene regulatory networks , 2019, bioRxiv.

[29]  Lin Zhang,et al.  simATAC: a single-cell ATAC-seq simulation framework , 2020, Genome Biology.

[30]  Aviv Regev,et al.  Chromatin Potential Identified by Shared Single-Cell Profiling of RNA and Chromatin , 2020, Cell.

[31]  M Dugas,et al.  Benchmarking atlas-level data integration in single-cell genomics , 2020, Nature Methods.

[32]  Q. Nie,et al.  Inferring spatial and signaling relationships between cells from single cell transcriptomic data , 2020, Nature Communications.

[33]  Samantha A. Morris,et al.  Dissecting cell identity via network inference and in silico gene perturbation , 2023, Nature.

[34]  Q. Nie,et al.  scAI: an unsupervised approach for the integrative analysis of parallel single-cell transcriptomic and epigenomic profiles , 2020, Genome Biology.

[35]  S. Teichmann,et al.  Computational methods for single-cell omics across modalities , 2020, Nature Methods.

[36]  Barbara Di Camillo,et al.  SPARSim single cell: a count data simulator for scRNA-seq data , 2019, Bioinform..

[37]  Y. Saeys,et al.  NicheNet: modeling intercellular communication by linking ligands to target genes , 2019, Nature Methods.

[38]  Fabian J Theis,et al.  Generalizing RNA velocity to transient cell states through dynamical modeling , 2019, Nature Biotechnology.

[39]  Kun Zhang,et al.  High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell , 2019, Nature Biotechnology.

[40]  Prisca Liberali,et al.  Exploring single cells in space and time during tissue development, homeostasis and regeneration , 2019, Development.

[41]  N. Yosef,et al.  Simulating multiple faceted variability in single cell RNA sequencing , 2019, Nature Communications.

[42]  Evan Z. Macosko,et al.  Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity , 2019, Cell.

[43]  Paul J. Hoffman,et al.  Comprehensive Integration of Single-Cell Data , 2018, Cell.

[44]  T. M. Murali,et al.  Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data , 2019, Nature Methods.

[45]  Yvan Saeys,et al.  A comparison of single-cell trajectory inference methods , 2019, Nature Biotechnology.

[46]  Michael J. Lawson,et al.  Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH+ , 2019, Nature.

[47]  Fabian J Theis,et al.  PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells , 2019, Genome biology.

[48]  Evan Z. Macosko,et al.  Slide-seq: A scalable technology for measuring genome-wide expression at high spatial resolution , 2019, Science.

[49]  Charlotte Soneson,et al.  A systematic performance evaluation of clustering methods for single-cell RNA-seq data , 2020, F1000Research.

[50]  Christoph Hafemeister,et al.  Comprehensive integration of single cell data , 2018, bioRxiv.

[51]  Andrew C. Adey,et al.  Joint profiling of chromatin accessibility and gene expression in thousands of single cells , 2018, Science.

[52]  Ana Conesa,et al.  MOSim: bulk and single-cell multi-layer regulatory network simulator , 2018, bioRxiv.

[53]  R. Sarpong,et al.  Bio-inspired synthesis of xishacorenes A, B, and C, and a new congener from fuscol† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c9sc02572c , 2019, Chemical science.

[54]  William E. Allen,et al.  Three-dimensional intact-tissue sequencing of single-cell transcriptional states , 2018, Science.

[55]  Erik Sundström,et al.  RNA velocity of single cells , 2018, Nature.

[56]  Yong Wang,et al.  Integrative analysis of single-cell genomics data by coupled nonnegative matrix factorizations , 2018, Proceedings of the National Academy of Sciences.

[57]  Joseph T. Roland,et al.  Unsupervised Trajectory Analysis of Single-Cell RNA-Seq and Imaging Data Reveals Alternative Tuft Cell Origins in the Gut. , 2017, Cell systems.

[58]  Vanessa M. Peterson,et al.  Multiplexed quantification of proteins and transcripts in single cells , 2017, Nature Biotechnology.

[59]  Hannah A. Pliner,et al.  Reversed graph embedding resolves complex single-cell trajectories , 2017, Nature Methods.

[60]  H. Swerdlow,et al.  Large-scale simultaneous measurement of epitopes and transcriptomes in single cells , 2017, Nature Methods.

[61]  A. Oshlack,et al.  Splatter: simulation of single-cell RNA sequencing data , 2017, bioRxiv.

[62]  Russell B. Fletcher,et al.  Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics , 2017, bioRxiv.

[63]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[64]  Joshua W. K. Ho,et al.  CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data , 2016, Genome Biology.

[65]  Rudiyanto Gunawan,et al.  SINCERITIES: inferring gene regulatory networks from time-stamped single cell transcriptional expression profiles , 2016, bioRxiv.

[66]  L. Cai,et al.  In Situ Transcription Profiling of Single Cells Reveals Spatial Organization of Cells in the Mouse Hippocampus , 2016, Neuron.

[67]  Patrik L. Ståhl,et al.  Visualization and analysis of gene expression in tissue sections by spatial transcriptomics , 2016, Science.

[68]  Hongkai Ji,et al.  TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis , 2016, Nucleic acids research.

[69]  Seongho Kim ppcor: An R Package for a Fast Calculation to Semi-partial Correlation Coefficients. , 2015, Communications for statistical applications and methods.

[70]  Rona S. Gertner,et al.  Single-Cell Genomics Unveils Critical Regulators of Th17 Cell Pathogenicity , 2015, Cell.

[71]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[72]  J. Marioni,et al.  Inferring the kinetics of stochastic gene expression from single-cell RNA-sequencing data , 2013, Genome Biology.

[73]  A. van Oudenaarden,et al.  Using Gene Expression Noise to Understand Gene Regulation , 2012, Science.

[74]  W. Marsden I and J , 2012 .

[75]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[76]  J. Peccoud,et al.  Markovian Modeling of Gene-Product Synthesis , 1995 .