Model selection for factorial Gaussian graphical models with an application to dynamic regulatory networks

Abstract Factorial Gaussian graphical Models (fGGMs) have recently been proposed for inferring dynamic gene regulatory networks from genomic high-throughput data. In the search for true regulatory relationships amongst the vast space of possible networks, these models allow the imposition of certain restrictions on the dynamic nature of these relationships, such as Markov dependencies of low order – some entries of the precision matrix are a priori zeros – or equal dependency strengths across time lags – some entries of the precision matrix are assumed to be equal. The precision matrix is then estimated by l1-penalized maximum likelihood, imposing a further constraint on the absolute value of its entries, which results in sparse networks. Selecting the optimal sparsity level is a major challenge for this type of approaches. In this paper, we evaluate the performance of a number of model selection criteria for fGGMs by means of two simulated regulatory networks from realistic biological processes. The analysis reveals a good performance of fGGMs in comparison with other methods for inferring dynamic networks and of the KLCV criterion in particular for model selection. Finally, we present an application on a high-resolution time-course microarray data from the Neisseria meningitidis bacterium, a causative agent of life-threatening infections such as meningitis. The methodology described in this paper is implemented in the R package sglasso, freely available at CRAN, http://CRAN.R-project.org/package=sglasso.

[1]  Kim-Chuan Toh,et al.  Solving Log-Determinant Optimization Problems by a Newton-CG Primal Proximal Point Algorithm , 2010, SIAM J. Optim..

[2]  Søren Højsgaard,et al.  Graphical Gaussian models with edge and vertex symmetries , 2008 .

[3]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[4]  V. Scarlato,et al.  In the NadR Regulon, Adhesins and Diverse Meningococcal Functions Are Regulated in Response to Signals in Human Saliva , 2011, Journal of bacteriology.

[5]  Chi-Ying F. Huang,et al.  Ultrasensitivity in the mitogen-activated protein kinase cascade. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Jan-Willem Romeijn,et al.  ‘All models are wrong...’: an introduction to model uncertainty , 2012 .

[7]  Shaun Lysen,et al.  Permuted Inclusion Criterion: A Variable Selection Technique , 2009 .

[8]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[9]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[10]  P. Bickel,et al.  Regularized estimation of large covariance matrices , 2008, 0803.1909.

[11]  Nigel J. Saunders,et al.  Host Iron Binding Proteins Acting as Niche Indicators for Neisseria meningitidis , 2009, PloS one.

[12]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[13]  K. Lange,et al.  Coordinate descent algorithms for lasso penalized regression , 2008, 0803.3876.

[14]  Ernst Wit,et al.  Factorial graphical models for dynamic networks , 2015, Network Science.

[15]  Korbinian Strimmer,et al.  From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data , 2007, BMC Systems Biology.

[16]  Nigel J Saunders,et al.  The use of the pan-Neisseria microarray and experimental design for transcriptomics studies of Neisseria. , 2012, Methods in molecular biology.

[17]  Kenneth J. Ryan,et al.  Sherris Medical Microbiology , 2003 .

[18]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[19]  Adam J. Rothman,et al.  Sparse permutation invariant covariance estimation , 2008, 0801.4837.

[20]  S. Salzberg,et al.  Complete genome sequence of Neisseria meningitidis serogroup B strain MC58. , 2000, Science.

[21]  B. Efron How Biased is the Apparent Error Rate of a Prediction Rule , 1986 .

[22]  Rina Foygel,et al.  Extended Bayesian Information Criteria for Gaussian Graphical Models , 2010, NIPS.

[23]  O. Kurzai,et al.  Expression of the meningococcal adhesin NadA is controlled by a transcriptional regulator of the MarR family , 2009, Molecular microbiology.

[24]  Mudita Singhal,et al.  COPASI - a COmplex PAthway SImulator , 2006, Bioinform..

[25]  C. Genco,et al.  Neisseria : molecular mechanisms of pathogenesis , 2010 .

[26]  Johannes Elias,et al.  Metabolism and virulence in Neisseria meningitidis , 2014, Front. Cell. Infect. Microbiol..

[27]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[28]  B. Efron The Estimation of Prediction Error , 2004 .

[29]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[30]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[31]  K. Miller On the Inverse of the Sum of Matrices , 1981 .

[32]  Yuehua Wu,et al.  TUNING PARAMETER SELECTION FOR PENALIZED LIKELIHOOD ESTIMATION OF GAUSSIAN GRAPHICAL MODEL , 2012 .

[33]  R. Rappuoli,et al.  A universal vaccine for serogroup B meningococcus. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[34]  E. Levina,et al.  Joint estimation of multiple graphical models. , 2011, Biometrika.

[35]  Ernst Wit,et al.  Differential geometric least angle regression: a differential geometric approach to sparse generalized linear models , 2013 .

[36]  Larry A. Wasserman,et al.  Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models , 2010, NIPS.

[37]  Fentaw Abegaz,et al.  Sparse time series chain graphical models for reconstructing genetic networks. , 2013, Biostatistics.

[38]  R. Rappuoli,et al.  Neisseria meningitidis: pathogenesis and immunity. , 2015, Current opinion in microbiology.

[39]  A. Goldbeter,et al.  Chaos and birhythmicity in a model for circadian oscillations of the PER and TIM proteins in drosophila , 1999, Journal of theoretical biology.

[40]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[41]  秀俊 松井,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2014 .

[42]  Ernst Wit,et al.  A computationally fast alternative to cross-validation in penalized Gaussian graphical models , 2013, 1309.6216.

[43]  Marco Grzegorczyk,et al.  Non-homogeneous dynamic Bayesian networks for continuous data , 2011, Machine Learning.

[44]  M. Grzegorczyk,et al.  Statistical inference of regulatory networks for circadian regulation , 2014, Statistical applications in genetics and molecular biology.