Joint learning of multiple gene networks from single-cell gene expression data

Inferring gene networks from gene expression data is important for understanding functional organizations within cells. With the accumulation of single-cell RNA sequencing (scRNA-seq) data, it is possible to infer gene networks at single cell level. However, due to the characteristics of scRNA-seq data, such as cellular heterogeneity and high sparsity caused by dropout events, traditional network inference methods may not be suitable for scRNA-seq data. In this study, we introduce a novel joint Gaussian copula graphical model (JGCGM) to jointly estimate multiple gene networks for multiple cell subgroups from scRNA-seq data. Our model can deal with non-Gaussian data with missing values, and identify the common and unique network structures of multiple cell subgroups, which is suitable for scRNA-seq data. Extensive experiments on synthetic data demonstrate that our proposed model outperforms other compared state-of-the-art network inference models. We apply our model to real scRNA-seq data sets to infer gene networks of different cell subgroups. Hub genes in the estimated gene networks are found to be biological significance.

[1]  Yidong Chen,et al.  scdNet: a computational tool for single-cell differential network analysis , 2018, BMC Systems Biology.

[2]  Shyamanta M. Hazarika,et al.  A New Pattern-Based Flexible Approach for Maintaining a Constrained Workflow , 2014, Int. J. Softw. Eng. Knowl. Eng..

[3]  Jaejik Kim,et al.  Validation and selection of ODE models for gene regulatory networks , 2016 .

[4]  Kevin R. Moon,et al.  Recovering Gene Interactions from Single-Cell Data Using Data Diffusion , 2018, Cell.

[5]  Martin Wainwright,et al.  Handbook of Graphical Models , 2018 .

[6]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[7]  Stephanie C. Hicks,et al.  A systematic evaluation of single-cell RNA-sequencing imputation methods , 2020, Genome Biology.

[8]  Nancy R. Zhang,et al.  SAVER: Gene expression recovery for single-cell RNA sequencing , 2018, Nature Methods.

[9]  Thalia E. Chan,et al.  Gene Regulatory Network Inference from Single-Cell Data Using Multivariate Information Measures , 2016, bioRxiv.

[10]  C. O’Brien Statistical Learning with Sparsity: The Lasso and Generalizations , 2016 .

[11]  S. Teichmann,et al.  Computational and analytical challenges in single-cell transcriptomics , 2015, Nature Reviews Genetics.

[12]  Ahmet Sacan,et al.  Data simulation and regulatory network reconstruction from time-series microarray data using stepwise multiple linear regression , 2012, Network Modeling Analysis in Health Informatics and Bioinformatics.

[13]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[14]  Shinji Masui,et al.  Rex1/Zfp42 is dispensable for pluripotency in mouse ES cells , 2008, BMC Developmental Biology.

[15]  Jing Ma,et al.  Joint Structural Estimation of Multiple Graphical Models , 2016, J. Mach. Learn. Res..

[16]  Yves Moreau,et al.  GRNBoost2 and Arboreto: efficient and scalable inference of gene regulatory networks , 2018, Bioinform..

[17]  E. Levina,et al.  Joint estimation of multiple graphical models. , 2011, Biometrika.

[18]  Jessica C. Mar,et al.  Investigating skewness to understand gene expression heterogeneity in large patient cohorts , 2019, BMC Bioinformatics.

[19]  J. Miyazaki,et al.  Quantitative expression of Oct-3/4 defines differentiation, dedifferentiation or self-renewal of ES cells , 2000, Nature Genetics.

[20]  Patrick Danaher,et al.  The joint graphical lasso for inverse covariance estimation across multiple classes , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[21]  J. Kingdom,et al.  The Hand1, Stra13 and Gcm1 transcription factors override FGF signaling to promote terminal differentiation of trophoblast stem cells. , 2004, Developmental biology.

[22]  Il-Youp Kwak,et al.  DrImpute: imputing dropout events in single cell RNA sequencing data , 2017, BMC Bioinformatics.

[23]  Peng Qiu,et al.  Embracing the dropouts in single-cell RNA-seq analysis , 2020, Nature Communications.

[24]  Hong Yan,et al.  A Joint Graphical Model for Inferring Gene Networks Across Multiple Subpopulations and Data Types , 2019, IEEE Transactions on Cybernetics.

[25]  Larry A. Wasserman,et al.  Sparse Nonparametric Graphical Models , 2012, ArXiv.

[26]  A. Banerjee,et al.  Gaussian Copula Precision Estimation with Missing Values , 2014, AISTATS.

[27]  Sumit Mukherjee,et al.  Identifying progressive gene network perturbation from single-cell RNA-seq data , 2018, bioRxiv.

[28]  Jamil Ahmad,et al.  Parameter estimation of qualitative biological regulatory networks on high performance computing hardware , 2018, BMC Systems Biology.

[29]  Pradeep Ravikumar,et al.  On Poisson Graphical Models , 2013, NIPS.

[30]  Peng Qiu,et al.  Embracing the dropouts in single-cell RNA-seq data , 2018, bioRxiv.

[31]  A. Fortuna,et al.  T cell numbers relate to bone involvement in Gaucher disease. , 1999, Blood cells, molecules & diseases.

[32]  Jie Li,et al.  High-throughput single-cell whole-genome amplification through centrifugal emulsification and eMDA , 2019, Communications Biology.

[33]  I. Komuro,et al.  UTF1 is a chromatin-associated protein involved in ES cell differentiation , 2007, The Journal of cell biology.

[34]  Beilun Wang,et al.  A constrained $$\ell $$ℓ1 minimization approach for estimating multiple sparse Gaussian or nonparanormal graphical models , 2016, Machine Learning.

[35]  Angshul Majumdar,et al.  McImpute: Matrix Completion Based Imputation for Single Cell RNA-seq Data , 2018, bioRxiv.

[36]  Andrea Rau,et al.  A Hierarchical Poisson Log-Normal Model for Network Inference from RNA Sequencing Data , 2013, PloS one.

[37]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[38]  Tianwei Yu,et al.  Differential gene network analysis from single cell RNA-seq. , 2017, Journal of genetics and genomics = Yi chuan xue bao.

[39]  Alexey M. Kozlov,et al.  Eleven grand challenges in single-cell data science , 2020, Genome Biology.

[40]  Dayanne M. Castro,et al.  Gene regulatory network reconstruction using single-cell RNA sequencing of barcoded genotypes in diverse environments , 2019, bioRxiv.

[41]  H. Zou,et al.  Regularized rank-based estimation of high-dimensional nonparanormal graphical models , 2012, 1302.3082.

[42]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[43]  A. Giangrande,et al.  glide/gcm is expressed and required in the scavenger cell lineage. , 1997, Developmental biology.

[44]  T. M. Murali,et al.  Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data , 2019, Nature Methods.

[45]  Su-In Lee,et al.  Node-based learning of multiple Gaussian graphical models , 2013, J. Mach. Learn. Res..

[46]  Larry A. Wasserman,et al.  The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs , 2009, J. Mach. Learn. Res..

[47]  Manuel Sanchez-Castillo,et al.  A Bayesian framework for the inference of gene regulatory networks from time and pseudo‐time series data , 2018, Bioinform..

[48]  Yen-Wei Chen,et al.  Network modeling of single-cell omics data: challenges, opportunities, and progresses , 2019, Emerging topics in life sciences.

[49]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[50]  Xiao-Bing Zhang,et al.  Impact of Fibronectin Knockout on Proliferation and Differentiation of Human Infrapatellar Fat Pad-Derived Stem Cells , 2019, Front. Bioeng. Biotechnol..

[51]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[52]  R. Stewart,et al.  Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm , 2016, Genome Biology.

[53]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[54]  Xiaodong Wang,et al.  Gene Regulatory Network Reconstruction Using Conditional Mutual Information , 2008, EURASIP J. Bioinform. Syst. Biol..

[55]  Loukia Meligkotsidou,et al.  Multivariate Poisson regression with covariance structure , 2005, Stat. Comput..

[56]  Richard Bonneau,et al.  Multi-study inference of regulatory networks for more accurate models of gene regulation , 2018, bioRxiv.

[57]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[58]  D. Papatsenko,et al.  Expression of Podocalyxin Separates the Hematopoietic and Vascular Potentials of Mouse Embryonic Stem Cell‐Derived Mesoderm , 2014, Stem cells.

[59]  T. Cai,et al.  A Constrained ℓ1 Minimization Approach to Sparse Precision Matrix Estimation , 2011, 1102.2233.

[60]  Z. Duan,et al.  Cancer Genetic Network Inference Using Gaussian Graphical Models , 2019, Bioinformatics and biology insights.

[61]  Jessica C. Mar,et al.  Evaluating methods of inferring gene regulatory networks highlights their lack of performance for single cell gene expression data , 2018, BMC Bioinformatics.

[62]  Joel Voldman,et al.  Attenuation of extrinsic signaling reveals the importance of matrix remodeling on maintenance of embryonic stem cell self-renewal , 2012, Proceedings of the National Academy of Sciences.

[63]  Wei Vivian Li,et al.  An accurate and robust imputation method scImpute for single-cell RNA-seq data , 2018, Nature Communications.

[64]  W. Kruskal Ordinal Measures of Association , 1958 .

[65]  Lin Li,et al.  Cell-specific network constructed by single-cell RNA sequencing data , 2019, Nucleic acids research.

[66]  Genevera I. Allen,et al.  A Local Poisson Graphical Model for Inferring Networks From Sequencing Data , 2013, IEEE Transactions on NanoBioscience.

[67]  Larry A. Wasserman,et al.  High Dimensional Semiparametric Gaussian Copula Graphical Models. , 2012, ICML 2012.

[68]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[69]  S. Kotz,et al.  The Meta-elliptical Distributions with Given Marginals , 2002 .