Modeling gene regulatory networks using neural network architectures

Gene regulatory networks (GRNs) encode the complex molecular interactions that govern cell identity. Here we propose DeepSEM, a deep generative model that can jointly infer GRNs and biologically meaningful representation of single-cell RNA sequencing (scRNA-seq) data. In particular, we developed a neural network version of the structural equation model (SEM) to explicitly model the regulatory relationships among genes. Benchmark results show that DeepSEM achieves comparable or better performance on a variety of single-cell computational tasks, such as GRN inference, scRNA-seq data visualization, clustering and simulation, compared with the state-of-the-art methods. In addition, the gene regulations predicted by DeepSEM on cell-type marker genes in the mouse cortex can be validated by epigenetic data, which further demonstrates the accuracy and efficiency of our method. DeepSEM can provide a useful and powerful tool to analyze scRNA-seq data and infer a GRN. The authors propose a deep learning model that analyzes single-cell RNA sequencing (scRNA-seq) data by explicitly modeling gene regulatory networks (GRNs), outperforming the state-of-art methods on various tasks, including GRN inference, scRNA-seq analysis and simulation.

[1]  Avi Ma'ayan,et al.  ESCAPE: database for integrating high-content published data collected from human and mouse embryonic stem cells , 2013, Database J. Biol. Databases Curation.

[2]  Samantha A. Morris,et al.  Dissecting Engineered Cell Types and Enhancing Cell Fate Conversion via CellNet , 2014, Cell.

[3]  Mo Yu,et al.  DAG-GNN: DAG Structure Learning with Graph Neural Networks , 2019, ICML.

[4]  Justin P Sandoval,et al.  Single-cell methylomes identify neuronal subtypes and regulatory elements in mammalian cortex , 2017, Science.

[5]  Frédo Durand,et al.  Data augmentation using learned transforms for one-shot medical image segmentation , 2019, ArXiv.

[6]  Bruce J. Aronow,et al.  Single-cell analysis of mixed-lineage states leading to a binary cell fate choice , 2016, Nature.

[7]  G. Fan,et al.  DNA Methylation and Its Basic Function , 2013, Neuropsychopharmacology.

[8]  William Stafford Noble,et al.  FIMO: scanning for occurrences of a given motif , 2011, Bioinform..

[9]  J. Keilwagen,et al.  Accurate prediction of cell type-specific transcription factor binding , 2019, Genome Biology.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  R. Stewart,et al.  Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm , 2016, Genome Biology.

[12]  Cody N Heiser,et al.  A Quantitative Framework for Evaluating Single-Cell Data Structure Preservation by Dimensionality Reduction Techniques , 2020, Cell reports.

[13]  Christian H. Holland,et al.  Benchmark and integration of resources for the estimation of human transcription factor activities. , 2019, Genome research.

[14]  H. Binder,et al.  Multilineage communication regulates human liver bud development from pluripotency , 2017, Nature.

[15]  A. Goldberger,et al.  Structural Equation Models in the Social Sciences. , 1974 .

[16]  Jung Eun Shim,et al.  TRRUST: a reference database of human transcriptional regulatory interactions , 2015, Scientific Reports.

[17]  Nathan C. Sheffield,et al.  The accessible chromatin landscape of the human genome , 2012, Nature.

[18]  P. Kharchenko,et al.  Bayesian approach to single-cell differential expression analysis , 2014, Nature Methods.

[19]  Xun Zhu,et al.  DeepImpute: an accurate, fast, and scalable deep neural network method to impute single-cell RNA-seq data , 2019, Genome Biology.

[20]  Luyi Tian,et al.  Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments , 2019, Nature Methods.

[21]  Nicola K. Wilson,et al.  A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. , 2016, Blood.

[22]  Thalia E. Chan,et al.  Gene Regulatory Network Inference from Single-Cell Data Using Multivariate Information Measures , 2016, bioRxiv.

[23]  L. J. K. Wee,et al.  Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors , 2017, Nature Genetics.

[24]  Shuqiang Li,et al.  CEL-Seq2: sensitive highly-multiplexed single-cell RNA-Seq , 2016, Genome Biology.

[25]  Rudiyanto Gunawan,et al.  SINCERITIES: inferring gene regulatory networks from time-stamped single cell transcriptional expression profiles , 2016, bioRxiv.

[26]  Mohammad Lotfollahi,et al.  scGen predicts single-cell perturbation responses , 2019, Nature Methods.

[27]  Michael I. Jordan,et al.  Deep Generative Modeling for Single-cell Transcriptomics , 2018, Nature Methods.

[28]  Fabian J Theis,et al.  Single-cell RNA-seq denoising using a deep count autoencoder , 2019, Nature Communications.

[29]  Seongho Kim ppcor: An R Package for a Fast Calculation to Semi-partial Correlation Coefficients. , 2015, Communications for statistical applications and methods.

[30]  T. Haavelmo The Statistical Implications of a System of Simultaneous Equations , 1943 .

[31]  D. M. Smith,et al.  Single-Cell Transcriptome Profiling of Human Pancreatic Islets in Health and Type 2 Diabetes , 2016, Cell metabolism.

[32]  Christine W. Duarte,et al.  A hybrid Bayesian Network/Structural Equation Modeling (BN/SEM) approach for detecting physiological networks for obesity-related genetic variants , 2011, 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW).

[33]  Heng Huang,et al.  Conditional generative adversarial network for gene expression inference , 2018, Bioinform..

[34]  Yves Moreau,et al.  GRNBoost2 and Arboreto: efficient and scalable inference of gene regulatory networks , 2018, Bioinform..

[35]  A. Regev,et al.  Revealing the vectors of cellular identity with single-cell genomics , 2016, Nature Biotechnology.

[36]  Pradeep Ravikumar,et al.  DAGs with NO TEARS: Continuous Optimization for Structure Learning , 2018, NeurIPS.

[37]  Lai Guan Ng,et al.  Dimensionality reduction for visualizing single-cell data using UMAP , 2018, Nature Biotechnology.

[38]  Cory C. Funk,et al.  Atlas of Transcription Factor Binding Sites from ENCODE DNase Hypersensitivity Data Across 27 Tissue Types , 2018, bioRxiv.

[39]  Z. Bar-Joseph,et al.  Using neural networks for reducing the dimensions of single-cell RNA-Seq data , 2017, Nucleic acids research.

[40]  Alex A. Pollen,et al.  Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex , 2014, Nature Biotechnology.

[41]  R. Sandberg,et al.  Single-Cell RNA-Seq Reveals Dynamic, Random Monoallelic Gene Expression in Mammalian Cells , 2014, Science.

[42]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[43]  Diederik P. Kingma,et al.  An Introduction to Variational Autoencoders , 2019, Found. Trends Mach. Learn..

[44]  Pierre Machart,et al.  Realistic in silico generation and augmentation of single-cell RNA-seq data using generative adversarial networks , 2020, Nature Communications.

[45]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[46]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[47]  B. Ren,et al.  Comprehensive analysis of single cell ATAC-seq data with SnapATAC , 2021, Nature Communications.

[48]  J. Aerts,et al.  SCENIC: Single-cell regulatory network inference and clustering , 2017, Nature Methods.

[49]  Justine Jia Wen Seow,et al.  Onco-fetal Reprogramming of Endothelial Cells Drives Immunosuppressive Macrophages in Hepatocellular Carcinoma , 2020, Cell.

[50]  Kui Wang,et al.  Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq analysis , 2020, Nature Communications.

[51]  I. Nikaido,et al.  Single-cell full-length total RNA sequencing uncovers dynamics of recursive splicing and enhancer RNAs , 2018, Nature Communications.

[52]  Jian Peng,et al.  When causal inference meets deep learning , 2020, Nature Machine Intelligence.

[53]  J. Michael Cherry,et al.  The Encyclopedia of DNA elements (ENCODE): data portal update , 2017, Nucleic Acids Res..

[54]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[55]  T. M. Murali,et al.  Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data , 2020, Nature Methods.

[56]  Pietro Liò,et al.  Adversarial generation of gene expression data , 2021, Bioinform..

[57]  Hisanori Kiryu,et al.  SCODE: an efficient regulatory network inference algorithm from single-cell RNA-Seq during differentiation , 2016, bioRxiv.

[58]  Changwon Yoo,et al.  Combining Structure Equation Model with Bayesian Networks for predicting with high accuracy of recommending surgery for better survival in Benign prostatic hyperplasia patients , 2013 .

[59]  Michael Q. Zhang,et al.  Network embedding-based representation learning for single cell RNA-seq data , 2017, Nucleic acids research.

[60]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[61]  I. Jolliffe Principal Component Analysis and Factor Analysis , 1986 .

[62]  John C. Marioni,et al.  Single-cell RNA-sequencing of differentiating iPS cells reveals dynamic genetic effects on gene expression , 2020, Nature Communications.

[63]  V. Fellman,et al.  A sensitive assay for dNTPs based on long synthetic oligonucleotides, EvaGreen dye and inhibitor-resistant high-fidelity DNA polymerase , 2020, Nucleic acids research.

[64]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[65]  Jun Sese,et al.  ChIP‐Atlas: a data‐mining suite powered by full integration of public ChIP‐seq data , 2018, EMBO reports.

[66]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[67]  Samantha A. Morris,et al.  CellNet: Network Biology Applied to Stem Cell Engineering , 2014, Cell.

[68]  Kenneth A. Bollen,et al.  Structural Equations with Latent Variables , 1989 .

[69]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[70]  Xin Zhou,et al.  Enhancing single-cell cellular state inference by incorporating molecular network features , 2019, bioRxiv.

[71]  Mauro J. Muraro,et al.  A Single-Cell Transcriptome Atlas of the Human Pancreas , 2016, Cell systems.

[72]  Travis S. Johnson,et al.  BERMUDA: a novel deep transfer learning method for single-cell RNA sequencing batch correction reveals hidden high-resolution cellular subtypes , 2019, Genome Biology.

[73]  Fabian J Theis,et al.  SCANPY: large-scale single-cell gene expression data analysis , 2018, Genome Biology.

[74]  Åsa K. Björklund,et al.  Smart-seq2 for sensitive full-length transcriptome profiling in single cells , 2013, Nature Methods.

[75]  Samantha A. Morris,et al.  Dissecting cell identity via network inference and in silico gene perturbation , 2023, Nature.

[76]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[77]  Canglin Wu,et al.  RegNetwork: an integrated database of transcriptional and post-transcriptional regulatory networks in human and mouse , 2015, Database J. Biol. Databases Curation.

[78]  Rona S. Gertner,et al.  Single cell RNA Seq reveals dynamic paracrine control of cellular variation , 2014, Nature.

[79]  E. Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[80]  Damian Szklarczyk,et al.  STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets , 2018, Nucleic Acids Res..

[81]  Phillip A. Richmond,et al.  JASPAR 2020: update of the open-access database of transcription factor binding profiles , 2019, Nucleic Acids Res..

[82]  Allan R. Jones,et al.  Shared and distinct transcriptomic cell types across neocortical areas , 2018, Nature.