Path-level interpretation of Gaussian graphical models using the pair-path subscore

Background  Construction of networks from cross-sectional biological data is increasingly common. Many recent methods have been based on Gaussian graphical modeling, and prioritize estimation of conditional pairwise dependencies among nodes in the network. However, challenges remain on how specific paths through the resultant network contribute to overall ‘network-level’ correlations. For biological applications, understanding these relationships is particularly relevant for parsing structural information contained in complex subnetworks. Results We propose the pair-path subscore (PPS), a method for interpreting Gaussian graphical models at the level of individual network paths. The scoring is based on the relative importance of such paths in determining the Pearson correlation between their terminal nodes. PPS is validated using human metabolomics data from the Hyperglycemia and adverse pregnancy outcome (HAPO) study, with observations confirming well-documented biological relationships among the metabolites. We also highlight how the PPS can be used in an exploratory fashion to generate new biological hypotheses. Our method is implemented in the R package pps , available at https://github.com/nathan-gill/pps . Conclusions The PPS can be used to probe network structure on a finer scale by investigating which paths in a potentially intricate topology contribute most substantially to marginal behavior. Adding PPS to the network analysis toolkit may enable researchers to ask new questions about the relationships among nodes in network data.

[1]  Agata Fronczak,et al.  Average path length in random networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2]  Ting Hu,et al.  Differential metabolomics analysis allows characterization of diversity of metabolite networks between males and females , 2018, PloS one.

[3]  D. Scholtens,et al.  Maternal metabolites during pregnancy are associated with newborn outcomes and hyperinsulinaemia across ancestries , 2018, Diabetologia.

[4]  P. Bickel,et al.  Regularized estimation of large covariance matrices , 2008, 0803.1909.

[5]  S. A. Khonsary Guyton and Hall: Textbook of Medical Physiology , 2017, Surgical Neurology International.

[6]  O. Fiehn,et al.  Differential metabolic networks unravel the effects of silent plant phenotypes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[7]  S. Wright The Method of Path Coefficients , 1934 .

[8]  Alisdair R Fernie,et al.  Network-based strategies in metabolomics data analysis and interpretation: from molecular networking to biological interpretation , 2020, Expert review of proteomics.

[9]  Harrison H. Zhou,et al.  Asymptotic normality and optimalities in estimation of large Gaussian graphical models , 2013, 1309.6024.

[10]  Michael I. Jordan Graphical Models , 2003 .

[11]  Ralf Steuer,et al.  Review: On the analysis and interpretation of correlations in metabolomic data , 2006, Briefings Bioinform..

[12]  D. Scholtens,et al.  Maternal BMI and Glycemia Impact the Fetal Metabolome , 2017, Diabetes Care.

[13]  Ting Hu,et al.  Differential metabolomics networks analysis of menopausal status , 2019, PloS one.

[14]  Wei Chen,et al.  FastGGM: An Efficient Algorithm for the Inference of Gaussian Graphical Model in Biological Networks , 2016, PLoS Comput. Biol..

[15]  Thomas Linke,et al.  Visualizing plant metabolomic correlation networks using clique-metabolite matrices , 2001, Bioinform..

[16]  Robert Castelo,et al.  A Robust Procedure For Gaussian Graphical Model Search From Microarray Data With p Larger Than n , 2006, J. Mach. Learn. Res..

[17]  Rina Foygel,et al.  Extended Bayesian Information Criteria for Gaussian Graphical Models , 2010, NIPS.

[18]  Antonio Rosato,et al.  From correlation to causation: analysis of metabolomics data using systems biology approaches , 2018, Metabolomics.

[19]  Larry A. Wasserman,et al.  Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models , 2010, NIPS.

[20]  P. Mendes,et al.  The origin of correlations in metabolomics data , 2005, Metabolomics.

[21]  S. Havlin,et al.  Scale-free networks are ultrasmall. , 2002, Physical review letters.

[22]  Jonathan D. Cryer,et al.  Time Series Analysis , 1986 .

[23]  Sigridur Sia Jonsdottir,et al.  Hyperglycemia and Adverse Pregnancy Outcomes , 2009 .

[24]  L. Berglund,et al.  Pentadecanoic acid in serum as a marker for intake of milk fat: relations between intake of milk fat and metabolic risk factors. , 1999, The American journal of clinical nutrition.

[25]  M. E. J. Newman,et al.  Power laws, Pareto distributions and Zipf's law , 2005 .

[26]  William L. Lowe,et al.  Metabolic Networks and Metabolites Underlie Associations Between Maternal Glucose During Pregnancy and Newborn Size at Birth , 2016, Diabetes.

[27]  Hao He,et al.  A Statistical Test for Differential Network Analysis Based on Inference of Gaussian Graphical Model , 2019, Scientific Reports.

[28]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[29]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[30]  Lourens J. Waldorp,et al.  mgm: Estimating Time-Varying Mixed Graphical Models in High-Dimensional Data , 2015, Journal of Statistical Software.

[31]  William L. Lowe,et al.  Mixture model normalization for non-targeted gas chromatography/mass spectrometry metabolomics data , 2017, BMC Bioinformatics.