Divergence Weighted Independence Graphs for the Exploratory Analysis of Biological Expression Data

Motivation: Understanding biological processes requires tools for the exploratory analysis of multivariate data generated from in vitro and in vivo experiments. Part of such analyses is to visualise the interrelationships between observed variables. Results: We build on recent work using partial correlation, graphical Gaussian models, and stability selection to add divergence weighted independence graphs (DWIGs) to this toolbox. We measure all quantities in information units (bits and millibits), to give a common quantification of the strength of associations between variables and of the information explained by a fitted graphical model. The marginal mutual information (MI) and conditional MI between variables directly account for components of the information explained. The conditional MIs are displayed as edge weights in the independence graph of the variables, making the complete graph informative as to the unique association between those variables. The summary table of the information decomposition ‘total = explained + residual’ provides a simple comparison of graphical models suggested by different search routines, including stabilised versions. We demonstrate the relevance of the conditional MI statistics to the graphical model of the data by analysing simulated data from the insulin pathway with a known ground truth. Here the method of thresholding these statistics to suggest a network performs at least as well as several other network searching algorithms. In searching a biological data set for novel insight, we contrast the DWIGs from the fitted maximum weight spanning tree and from the fitted model of a stabilised ARACNE network. DWIG is a powerful tool for the display of properties of the fitted model or of the empirical data directly.

[1]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[2]  A. Kraneveld,et al.  Inflammatory changes in the airways of mice caused by cigarette smoke exposure are only partially reversed after smoking cessation , 2010, Respiratory research.

[3]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[4]  Elizaveta Levina,et al.  Discussion of "Stability selection" by N. Meinshausen and P. Buhlmann , 2010 .

[5]  David Edwards,et al.  Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests , 2010, BMC Bioinformatics.

[6]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[7]  Anne-Laure Boulesteix,et al.  Regularized estimation of large-scale gene association networks using graphical Gaussian models , 2009, BMC Bioinformatics.

[8]  S. Gharib,et al.  Of Mice and Men: Comparative Proteomics of Bronchoalveolar Fluid. , 2009, ATS 2009.

[9]  Antonio Reverter,et al.  Combining partial correlation and an information theory approach to the reversed engineering of gene co-expression networks , 2008, Bioinform..

[10]  Gianluca Bontempi,et al.  minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information , 2008, BMC Bioinformatics.

[11]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[12]  A. Gutiérrez-Fernández,et al.  Lack of matrix metalloproteinase-9 worsens ventilator-induced lung injury. , 2008, American Journal of Physiology - Lung cellular and Molecular Physiology.

[13]  Qingqiu Gong,et al.  An Arabidopsis gene network based on the graphical Gaussian model. , 2007, Genome research.

[14]  Hye-Youn Cho,et al.  Protective Role of Matrix Metalloproteinase-9 in Ozone-Induced Airway Inflammation , 2007, Environmental health perspectives.

[15]  Kevin Kontos,et al.  Information-Theoretic Inference of Large Transcriptional Regulatory Networks , 2007, EURASIP J. Bioinform. Syst. Biol..

[16]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[17]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[18]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..

[19]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[20]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[21]  Paul P. Wang,et al.  Advances to Bayesian network inference for generating causal networks from observational biological data , 2004, Bioinform..

[22]  Alberto de la Fuente,et al.  Discovery of meaningful associations in genomic data using partial correlation coefficients , 2004, Bioinform..

[23]  Paul M. Magwene,et al.  Estimating genomic coexpression networks using first-order conditional independence , 2004, Genome Biology.

[24]  M. West,et al.  Sparse graphical models for exploring gene expression data , 2004 .

[25]  J. Berger,et al.  Optimal predictive model selection , 2004, math/0406464.

[26]  Emden R. Gansner,et al.  Graphviz and Dynagraph – Static and Dynamic Graph Drawing Tools , 2003 .

[27]  A. Sherman,et al.  A mathematical model of metabolic insulin signaling pathways. , 2002, American journal of physiology. Endocrinology and metabolism.

[28]  Carsten O. Daub,et al.  The mutual information: Detecting and evaluating dependencies between variables , 2002, ECCB.

[29]  B. Ma,et al.  Overlapping and enzyme-specific contributions of matrix metalloproteinases-9 and -12 in IL-13-induced inflammation and remodeling. , 2002, The Journal of clinical investigation.

[30]  Hiroyuki Toh,et al.  Inference of a genetic network by a combined approach of cluster analysis and graphical Gaussian modeling , 2002, Bioinform..

[31]  A. Butte,et al.  Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[32]  S. M. Propst,et al.  Proinflammatory and Th2-Derived Cytokines Modulate CD40-Mediated Expression of Inflammatory Mediators in Airway Epithelia: Implications for the Role of Epithelial CD40 in Airway Inflammation , 2000, The Journal of Immunology.

[33]  M. Jordana,et al.  Disruption of antigen-induced inflammatory responses in CD40 ligand knockout mice. , 1998, The Journal of clinical investigation.

[34]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[35]  B. Margolis,et al.  Phosphatidylinositol 3′‐kinase is activated by association with IRS‐1 during insulin stimulation. , 1992, The EMBO journal.

[36]  T. Speed,et al.  Markov Fields and Log-Linear Interaction Models for Contingency Tables , 1980 .

[37]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[38]  S. Wright,et al.  The Theory of Path Coefficients a Reply to Niles's Criticism. , 1923, Genetics.