Learning to Discover Graphical Model Structures

We consider structure discovery of undirected graphical models from observational data. Inferring likely structures from few examples is a complex task often requiring the formulation of priors and sophisticated inference procedures. In the setting of Gaussian Graphical Models (GGMs) a popular estimator is a penalized maximum likelihood objective on the precision matrix. Adapting this objective to capture domain-specific knowledge as priors or a new data likelihood requires great effort. In addition, structure recovery is a very indirect consequence of the data-fit term. By contrast, it may be easier to generate training samples of data that arise from graphs with the desired properties. We propose here to leverage this latter source of information in order to learn a function that maps from empirical covariance matrices to estimated graph structures. Learning this function brings two benefits: it implicitly models the desired structure or sparsity properties to form suitable priors, and it can more directly be tailored to the specific problem of edge structure discovery. We apply this framework to several critical real world problems in structure discovery and show that it can be competitive to standard approaches such as graphical lasso, at a fraction of the execution speed. We use convolutional neural networks to parametrize our estimators due to the compositional block structure of matrix inversion. Experimentally, our learnable graph-discovery method trained on synthetic data generalizes well to different data: identifying relevant edges in real data, completely unknown at training time. We find that on genetics, brain imaging, and simulation data we obtain competitive (and often superior) performance, compared with analytical methods.

[1]  M. A. Gómez–Villegas,et al.  A MATRIX VARIATE GENERALIZATION OF THE POWER EXPONENTIAL FAMILY OF DISTRIBUTIONS , 2002 .

[2]  Tommi S. Jaakkola,et al.  On the Statistical Efficiency of $\ell_{1,p}$ Multi-Task Learning of Gaussian Graphical Models , 2012 .

[3]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[4]  Anders Ellern Bilgrau,et al.  Rags2ridges : Ridge estimation of precision matrices from high-dimensional data , 2017 .

[5]  A. Mohammadi,et al.  Bayesian Structure Learning in Sparse Gaussian Graphical Models , 2012, 1210.5371.

[6]  Vivek Rathod,et al.  Bayesian dark knowledge , 2015, NIPS.

[7]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[8]  Raquel Urtasun,et al.  Understanding the Effective Receptive Field in Deep Convolutional Neural Networks , 2016, NIPS.

[9]  Kaustubh Supekar,et al.  Estimation of functional connectivity in fMRI data using stability selection-based sparse partial correlation with elastic net penalty , 2012, NeuroImage.

[10]  Olivier Ledoit,et al.  A well-conditioned estimator for large-dimensional covariance matrices , 2004 .

[11]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[12]  Daniel P. Kennedy,et al.  The Autism Brain Imaging Data Exchange: Towards Large-Scale Evaluation of the Intrinsic Brain Architecture in Autism , 2013, Molecular Psychiatry.

[13]  Gaël Varoquaux,et al.  Multi-subject Dictionary Learning to Segment an Atlas of Brain Spontaneous Activity , 2011, IPMI.

[14]  Po-Ling Loh,et al.  Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses , 2012, NIPS.

[15]  Seungyeop Han,et al.  Structured Learning of Gaussian Graphical Models , 2012, NIPS.

[16]  A. Dalalyan,et al.  On estimation of the diagonal elements of a sparse precision matrix , 2015, 1504.04696.

[17]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[18]  Alexandr Andoni,et al.  Learning Polynomials with Neural Networks , 2014, ICML.

[19]  Jean-Baptiste Poline,et al.  Brain covariance selection: better individual functional connectivity models using population prior , 2010, NIPS.

[20]  Joan Bruna,et al.  Deep Convolutional Networks on Graph-Structured Data , 2015, ArXiv.

[21]  Gaël Varoquaux,et al.  Learning and comparing functional connectomes across subjects , 2013, NeuroImage.

[22]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[23]  Mohammad Emtiyaz Khan,et al.  Accelerating Bayesian Structural Inference for Non-Decomposable Gaussian Graphical Models , 2009, NIPS.

[24]  Abdolreza Mohammadi,et al.  BDgraph: An R Package for Bayesian Structure Learning in Graphical Models , 2015, Journal of Statistical Software.

[25]  Alex Lenkoski,et al.  A direct sampler for G‐Wishart variates , 2013, 1304.1350.

[26]  Patrick Danaher,et al.  The joint graphical lasso for inverse covariance estimation across multiple classes , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[27]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[28]  Samy Bengio,et al.  Order Matters: Sequence to sequence for sets , 2015, ICLR.

[29]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[30]  G. Marsaglia CONDITIONAL MEANS AND COVARIANCES OF NORMAL VARIABLES WITH SINGULAR COVARIANCE MATRIX , 1964 .

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Matthew B. Blaschko,et al.  Testing for Differences in Gaussian Graphical Models: Applications to Brain Connectivity , 2015, NIPS.

[33]  Bin Yu,et al.  High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence , 2008, 0811.3628.

[34]  Hisayuki Hara,et al.  A Localization Approach to Improve Iterative Proportional Scaling in Gaussian Graphical Models , 2008, 0802.2581.

[35]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[36]  Wen Gao,et al.  Maximal Sparsity with Deep Networks? , 2016, NIPS.

[37]  Nadav Cohen,et al.  On the Expressive Power of Deep Learning: A Tensor Analysis , 2015, COLT 2016.

[38]  Bernhard Schölkopf,et al.  Towards a Learning Theory of Causation , 2015, 1502.02398.

[39]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[40]  T. Cai,et al.  A Constrained ℓ1 Minimization Approach to Sparse Precision Matrix Estimation , 2011, 1102.2233.

[41]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[42]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..