FastGGM: An Efficient Algorithm for the Inference of Gaussian Graphical Model in Biological Networks

Biological networks provide additional information for the analysis of human diseases, beyond the traditional analysis that focuses on single variables. Gaussian graphical model (GGM), a probability model that characterizes the conditional dependence structure of a set of random variables by a graph, has wide applications in the analysis of biological networks, such as inferring interaction or comparing differential networks. However, existing approaches are either not statistically rigorous or are inefficient for high-dimensional data that include tens of thousands of variables for making inference. In this study, we propose an efficient algorithm to implement the estimation of GGM and obtain p-value and confidence interval for each edge in the graph, based on a recent proposal by Ren et al., 2015. Through simulation studies, we demonstrate that the algorithm is faster by several orders of magnitude than the current implemented algorithm for Ren et al. without losing any accuracy. Then, we apply our algorithm to two real data sets: transcriptomic data from a study of childhood asthma and proteomic data from a study of Alzheimer’s disease. We estimate the global gene or protein interaction networks for the disease and healthy samples. The resulting networks reveal interesting interactions and the differential networks between cases and controls show functional relevance to the diseases. In conclusion, we provide a computationally fast algorithm to implement a statistically sound procedure for constructing Gaussian graphical model and making inference with high-dimensional biological data. The algorithm has been implemented in an R package named “FastGGM”.

[1]  Tingni Sun,et al.  Discussion of minimax estimation of large covariance matrices under L1-Norm , 2013 .

[2]  So Ri Kim,et al.  Endoplasmic Reticulum Stress and the Related Signaling Networks in Severe Asthma , 2014, Allergy, asthma & immunology research.

[3]  A. Tsybakov,et al.  Comment: "Minimax estimation of large covariance matrices under ℓ1-norm'' , 2012 .

[4]  Dirk Eddelbuettel,et al.  Rcpp: Seamless R and C++ Integration , 2011 .

[5]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[6]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[7]  Hongzhe Li,et al.  A SPARSE CONDITIONAL GAUSSIAN GRAPHICAL MODEL FOR ANALYSIS OF GENETICAL GENOMICS DATA. , 2011, The annals of applied statistics.

[8]  Yang Ni,et al.  Integrative Bayesian Network Analysis of Genomic Data , 2014, Cancer informatics.

[9]  Sergio Contrino,et al.  ArrayExpress—a public repository for microarray gene expression data at the EBI , 2004, Nucleic Acids Res..

[10]  Haiyan Huang,et al.  Review on statistical methods for gene network reconstruction using expression data. , 2014, Journal of theoretical biology.

[11]  G. Schoeters,et al.  Microarray analysis of the effect of diesel exhaust particles on in vitro cultured macrophages. , 2004, Toxicology in vitro : an international journal published in association with BIBRA.

[12]  Cun-Hui Zhang,et al.  Sparse matrix inversion with scaled Lasso , 2012, J. Mach. Learn. Res..

[13]  Tommi S. Jaakkola,et al.  Inverse Covariance Estimation for High-Dimensional Data in Linear Time and Space: Spectral Methods for Riccati and Sparse Models , 2013, UAI.

[14]  Christopher Brightling,et al.  Targeting TNF-alpha: a novel therapeutic approach for asthma. , 2008, The Journal of allergy and clinical immunology.

[15]  Ying Ding,et al.  Altered Glutamate Protein Co-Expression Network Topology Linked to Spine Loss in the Auditory Cortex of Schizophrenia , 2015, Biological Psychiatry.

[16]  L. Leng,et al.  Role for macrophage migration inhibitory factor in asthma. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[17]  C. O. A. D. P. R. M. A. E. Stimation Covariate Adjusted Precision Matrix Estimation with an Application in Genetical Genomics , 2011 .

[18]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[19]  Han Liu,et al.  Local and Global Inference for High Dimensional Gaussian Copula Graphical Models , 2015 .

[20]  Christopher E. Brightling,et al.  Targeting TNF-α: A novel therapeutic approach for asthma , 2008 .

[21]  Harrison H. Zhou,et al.  Asymptotically Normal and Efficient Estimation of Covariate-Adjusted Gaussian Graphical Model , 2013, Journal of the American Statistical Association.

[22]  Michael I. Jordan Graphical Models , 2003 .

[23]  Haiyan Huang,et al.  Using biologically interrelated experiments to identify pathway genes in Arabidopsis , 2012, Bioinform..

[24]  Kim-Anh Do,et al.  Integrative network-based Bayesian analysis of diverse genomics data , 2013, BMC Bioinformatics.

[25]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[26]  Francesco C Stingo,et al.  A BAYESIAN GRAPHICAL MODELING APPROACH TO MICRORNA REGULATORY NETWORK INFERENCE. , 2011, The annals of applied statistics.

[27]  Harrison H. Zhou,et al.  Asymptotic normality and optimalities in estimation of large Gaussian graphical models , 2013, 1309.6024.

[28]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[29]  Larry A. Wasserman,et al.  The huge Package for High-dimensional Undirected Graph Estimation in R , 2012, J. Mach. Learn. Res..

[30]  T. Ideker,et al.  Differential network biology , 2012, Molecular systems biology.

[31]  Liming Liang,et al.  A cross-platform analysis of 14,177 expression quantitative trait loci derived from lymphoblastoid cell lines , 2013, Genome research.

[32]  Alberto de la Fuente,et al.  Discovery of meaningful associations in genomic data using partial correlation coefficients , 2004, Bioinform..

[33]  Han Liu,et al.  A General Theory of Hypothesis Tests and Confidence Regions for Sparse High Dimensional Models , 2014, 1412.8765.

[34]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[35]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[36]  Weidong Liu Gaussian graphical model estimation with false discovery rate control , 2013, 1306.0976.

[37]  Hongzhe Li,et al.  Adjusting for high-dimensional covariates in sparse precision matrix estimation by ℓ1-penalization , 2013, J. Multivar. Anal..

[38]  S. Geer,et al.  Confidence intervals for high-dimensional inverse covariance estimation , 2014, 1403.6752.