A single-cell expression simulator guided by gene regulatory networks

A common approach to benchmarking of single-cell transcriptomics tools is to generate synthetic datasets that statistically resemble experimental data. However, most existing single-cell simulators do not incorporate transcription factor-gene regulatory interactions that underlie expression dynamics. Here, we present SERGIO, a simulator of single-cell gene expression data that models the stochastic nature of transcription as well as regulation of genes by multiple transcription factors according to a user-provided gene regulatory network. SERGIO can simulate any number of cell types in steady state or cells differentiating to multiple fates. We show that datasets generated by SERGIO are statistically comparable to experimental data generated by Illumina HiSeq2000, Drop-seq, Illumina 10X chromium, and Smart-seq. We use SERGIO to benchmark several single-cell analysis tools, including GRN inference methods, and identify Tcf7, Gata3, and Bcl11b as key drivers of T cell differentiation by performing in silico knockout experiments. SERGIO is freely available for download here: https://github.com/PayamDiba/SERGIO.

[1]  S. Dudoit,et al.  A general and flexible method for signal extraction from single-cell RNA-seq data , 2018, Nature Communications.

[2]  Martin Hemberg,et al.  M3Drop: dropout-based feature selection for scRNASeq , 2018, Bioinform..

[3]  Wei Vivian Li,et al.  An accurate and robust imputation method scImpute for single-cell RNA-seq data , 2018, Nature Communications.

[4]  Russell B. Fletcher,et al.  Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics , 2017, BMC Genomics.

[5]  Keegan D. Korthauer,et al.  A statistical approach for identifying differential distributions in single-cell RNA-seq experiments , 2016, Genome Biology.

[6]  Philippe Salembier,et al.  NetBenchmark: a bioconductor package for reproducible benchmarks of gene regulatory network inference , 2015, BMC Bioinformatics.

[7]  Rudiyanto Gunawan,et al.  Assessment of Network Inference Methods: How to Cope with an Underdetermined Problem , 2014, PloS one.

[8]  Desmond J. Higham,et al.  Chemical Master Equation and Langevin regimes for a gene transcription model , 2008, Theor. Comput. Sci..

[9]  Charlotte Soneson,et al.  Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications , 2018, Genome Biology.

[10]  D. Gillespie A rigorous derivation of the chemical master equation , 1992 .

[11]  D. Gillespie A General Method for Numerically Simulating the Stochastic Time Evolution of Coupled Chemical Reactions , 1976 .

[12]  Dario Floreano,et al.  GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods , 2011, Bioinform..

[13]  Christoph Ziegenhain,et al.  powsimR: Power analysis for bulk and single cell RNA-seq experiments , 2017, bioRxiv.

[14]  Alireza F. Siahpirani,et al.  A prior-based integrative framework for functional transcriptional regulatory network inference , 2016, Nucleic acids research.

[15]  A. Oudenaarden,et al.  Cellular Decision Making and Biological Noise: From Microbes to Mammals , 2011, Cell.

[16]  Penghang Yin,et al.  SCRABBLE: single-cell RNA-seq imputation constrained by bulk RNA-seq data , 2019, Genome Biology.

[17]  Fabian J Theis,et al.  A cellular census of human lungs identifies novel cell states in health and in asthma , 2019, Nature Medicine.

[18]  Anthony Gitter,et al.  Network Inference with Granger Causality Ensembles on Single-Cell Transcriptomic Data , 2019, bioRxiv.

[19]  A. Oshlack,et al.  Splatter: simulation of single-cell RNA sequencing data , 2017, Genome Biology.

[20]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[21]  William J R Longabaugh,et al.  Bcl11b and combinatorial resolution of cell fate in the T-cell gene regulatory network , 2017, Proceedings of the National Academy of Sciences.

[22]  Rudiyanto Gunawan,et al.  SINCERITIES: inferring gene regulatory networks from time-stamped single cell transcriptional expression profiles , 2016, bioRxiv.

[23]  T. Schaffter Numerical Integration of SDEs: A Short Tutorial , 2010 .

[24]  A. Regev,et al.  Spatial reconstruction of single-cell gene expression , 2015, Nature Biotechnology.

[25]  Erik Sundström,et al.  RNA velocity of single cells , 2018, Nature.

[26]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[27]  Thalia E. Chan,et al.  Gene Regulatory Network Inference from Single-Cell Data Using Multivariate Information Measures , 2016, bioRxiv.

[28]  Lihua Zhang,et al.  Comparison of Computational Methods for Imputing Single-Cell RNA-Sequencing Data , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[29]  Fabian J Theis,et al.  SCANPY: large-scale single-cell gene expression data analysis , 2018, Genome Biology.

[30]  O. Stegle,et al.  Single-cell epigenomics: Recording the past and predicting the future , 2017, Science.

[31]  Cole Trapnell,et al.  The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells , 2014, Nature Biotechnology.

[32]  Jessica C. Mar,et al.  Evaluating methods of inferring gene regulatory networks highlights their lack of performance for single cell gene expression data , 2018, BMC Bioinformatics.

[33]  T. Kepler,et al.  Stochasticity in transcriptional regulation: origins, consequences, and mathematical representations. , 2001, Biophysical journal.

[34]  Christoph Hafemeister,et al.  Developmental diversification of cortical inhibitory interneurons , 2017, Nature.

[35]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[36]  Rui Hou,et al.  scMatch: a single-cell gene expression profile annotation tool using reference datasets , 2019, Bioinform..

[37]  Kieran R. Campbell,et al.  Uncovering pseudotemporal trajectories with covariates from single cell and bulk expression data , 2018, Nature Communications.

[38]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[39]  Davis J. McCarthy,et al.  Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation , 2012, Nucleic acids research.

[40]  Dominique Chu,et al.  Models of transcription factor binding: sensitivity of activation functions to model assumptions. , 2009, Journal of theoretical biology.

[41]  Guy Karlebach,et al.  Modelling and analysis of gene regulatory networks , 2008, Nature Reviews Molecular Cell Biology.

[42]  D. Wilkinson Stochastic modelling for quantitative description of heterogeneous biological systems , 2009, Nature Reviews Genetics.

[43]  Xiaochen Wang,et al.  scRMD: Imputation for single cell RNA-seq data via robust matrix decomposition. , 2020, Bioinformatics.

[44]  M. L. Simpson,et al.  Transcriptional Bursting Explains the Noise–Versus–Mean Relationship in mRNA and Protein Levels , 2016, PloS one.

[45]  Fabian J Theis,et al.  Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells , 2015, Nature Biotechnology.

[46]  Gabriel S. Eichler,et al.  Cell fates as high-dimensional attractor states of a complex gene regulatory network. , 2005, Physical review letters.

[47]  S. Teichmann,et al.  Computational and analytical challenges in single-cell transcriptomics , 2015, Nature Reviews Genetics.

[48]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[49]  Fabian J Theis,et al.  Single-cell RNA-seq denoising using a deep count autoencoder , 2019, Nature Communications.

[50]  Henrik Mannerström,et al.  SCHiRM: Single Cell Hierarchical Regression Model to detect dependencies in read count data , 2018, bioRxiv.

[51]  W. J. Valente,et al.  Acquired cancer resistance to combination immunotherapy from transcriptional loss of class I HLA , 2018, Nature Communications.

[52]  Aleksandra A. Kolodziejczyk,et al.  The technology and biology of single-cell RNA sequencing. , 2015, Molecular cell.

[53]  Kevin R. Moon,et al.  Recovering Gene Interactions from Single-Cell Data Using Data Diffusion , 2018, Cell.

[54]  D. Gillespie The chemical Langevin equation , 2000 .

[55]  Lior Pachter,et al.  RNA Velocity: Molecular Kinetics from Single-Cell RNA-Seq. , 2018, Molecular cell.

[56]  T. M. Murali,et al.  Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data , 2020, Nature Methods.

[57]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[58]  Lai Guan Ng,et al.  Dimensionality reduction for visualizing single-cell data using UMAP , 2018, Nature Biotechnology.

[59]  M. Basson,et al.  Signaling in cell differentiation and morphogenesis. , 2012, Cold Spring Harbor perspectives in biology.

[60]  Guillermo A. Cecchi,et al.  Noise-Driven Causal Inference in Biomolecular Networks , 2015, PloS one.

[61]  Liisa Holm,et al.  Benchmarking fold detection by DaliLite v.5 , 2019, Bioinform..

[62]  D. Gillespie Exact Stochastic Simulation of Coupled Chemical Reactions , 1977 .

[63]  E. Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[64]  Johannes Söding,et al.  PROSSTT: probabilistic simulation of single-cell RNA-seq data for complex differentiation processes , 2018, bioRxiv.

[65]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[66]  Yvan Saeys,et al.  A comprehensive evaluation of module detection methods for gene expression data , 2018, Nature Communications.

[67]  Richard Bonneau,et al.  The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo , 2006, Genome Biology.

[68]  Principal Investigators,et al.  Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris , 2018 .

[69]  Hisanori Kiryu,et al.  SCODE: an efficient regulatory network inference algorithm from single-cell RNA-Seq during differentiation , 2016, bioRxiv.

[70]  S. Linnarsson,et al.  Conserved properties of dentate gyrus neurogenesis across postnatal development revealed by single-cell RNA sequencing , 2018, Nature Neuroscience.

[71]  Linda R. Petzold,et al.  Stochastic modelling of gene regulatory networks , 2005 .

[72]  Canglin Wu,et al.  RegNetwork: an integrated database of transcriptional and post-transcriptional regulatory networks in human and mouse , 2015, Database J. Biol. Databases Curation.

[73]  Wuming Gong,et al.  A novel algorithm for the collective integration of single cell RNA-seq during embryogenesis , 2019, bioRxiv.

[74]  J. Aerts,et al.  SCENIC: Single-cell regulatory network inference and clustering , 2017, Nature Methods.

[75]  Joseph T. Roland,et al.  Unsupervised Trajectory Analysis of Single-Cell RNA-Seq and Imaging Data Reveals Alternative Tuft Cell Origins in the Gut. , 2017, Cell systems.

[76]  Q. Deng,et al.  Single-cell RNA sequencing: Technical advancements and biological applications. , 2017, Molecular aspects of medicine.

[77]  Mingyao Li,et al.  Single-cell transcriptomics of the mouse kidney reveals potential cellular targets of kidney disease , 2018, Science.

[78]  Nir Yosef,et al.  Simulating multiple faceted variability in single cell RNA sequencing , 2019, Nature Communications.

[79]  R. Satija,et al.  Single-cell RNA sequencing to explore immune cell heterogeneity , 2017, Nature Reviews Immunology.

[80]  Pierre Machart,et al.  Realistic in silico generation and augmentation of single-cell RNA-seq data using generative adversarial networks , 2020, Nature Communications.

[81]  Shahin Mohammadi,et al.  A geometric approach to characterize the functional identity of single cells , 2018, Nature Communications.

[82]  P. Swain,et al.  Intrinsic and extrinsic contributions to stochasticity in gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[83]  Ellen V Rothenberg,et al.  Single-Cell Analysis Reveals Regulatory Gene Expression Dynamics Leading to Lineage Commitment in Early T Cell Development. , 2019, Cell systems.

[84]  A. Ransick,et al.  Conserved and Divergent Features of Mesenchymal Progenitor Cell Types within the Cortical Nephrogenic Niche of the Human and Mouse Kidney. , 2018, Journal of the American Society of Nephrology : JASN.

[85]  Leor S Weinberger,et al.  Lentiviral vectors to study stochastic noise in gene expression. , 2011, Methods in enzymology.