Learning to Discover Sparse Graphical Models

We consider structure discovery of undirected graphical models from observational data. Inferring likely structures from few examples is a complex task often requiring the formulation of priors and sophisticated inference procedures. Popular methods rely on estimating a penalized maximum likelihood of the precision matrix. However, in these approaches structure recovery is an indirect consequence of the data-fit term, the penalty can be difficult to adapt for domain-specific knowledge, and the inference is computationally demanding. By contrast, it may be easier to generate training samples of data that arise from graphs with the desired structure properties. We propose here to leverage this latter source of information as training data to learn a function, parametrized by a neural network that maps empirical covariance matrices to estimated graph structures. Learning this function brings two benefits: it implicitly models the desired structure or sparsity properties to form suitable priors, and it can be tailored to the specific problem of edge structure discovery, rather than maximizing data likelihood. Applying this framework, we find our learnable graph-discovery method trained on synthetic data generalizes well: identifying relevant edges in both synthetic and real data, completely unknown at training time. We find that on genetics, brain imaging, and simulation data we obtain performance generally superior to analytical methods.

[1]  Anders Ellern Bilgrau,et al.  Rags2ridges : Ridge estimation of precision matrices from high-dimensional data , 2017 .

[2]  Bernhard Schölkopf,et al.  Towards a Learning Theory of Causation , 2015, 1502.02398.

[3]  A. Dalalyan,et al.  On estimation of the diagonal elements of a sparse precision matrix , 2015, 1504.04696.

[4]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[5]  Joan Bruna,et al.  Deep Convolutional Networks on Graph-Structured Data , 2015, ArXiv.

[6]  Martin J. Wainwright,et al.  Information-theoretic bounds on model selection for Gaussian Markov random fields , 2010, 2010 IEEE International Symposium on Information Theory.

[7]  M. A. Gómez–Villegas,et al.  A MATRIX VARIATE GENERALIZATION OF THE POWER EXPONENTIAL FAMILY OF DISTRIBUTIONS , 2002 .

[8]  Masashi Sugiyama,et al.  Bayesian Dark Knowledge , 2015 .

[9]  Hisayuki Hara,et al.  A Localization Approach to Improve Iterative Proportional Scaling in Gaussian Graphical Models , 2008, 0802.2581.

[10]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[11]  Wen Gao,et al.  Maximal Sparsity with Deep Networks? , 2016, NIPS.

[12]  Olivier Ledoit,et al.  A well-conditioned estimator for large-dimensional covariance matrices , 2004 .

[13]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[14]  Xinyun Chen Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .

[15]  Daniel P. Kennedy,et al.  The Autism Brain Imaging Data Exchange: Towards Large-Scale Evaluation of the Intrinsic Brain Architecture in Autism , 2013, Molecular Psychiatry.

[16]  Gaël Varoquaux,et al.  Multi-subject Dictionary Learning to Segment an Atlas of Brain Spontaneous Activity , 2011, IPMI.

[17]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[18]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[19]  Nadav Cohen,et al.  On the Expressive Power of Deep Learning: A Tensor Analysis , 2015, COLT 2016.

[20]  A. Mohammadi,et al.  Bayesian Structure Learning in Sparse Gaussian Graphical Models , 2012, 1210.5371.

[21]  Patrick Danaher,et al.  The joint graphical lasso for inverse covariance estimation across multiple classes , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[22]  Gaël Varoquaux,et al.  Learning and comparing functional connectomes across subjects , 2013, NeuroImage.

[23]  Michael I. Jordan Graphical Models , 2003 .

[24]  Bin Yu,et al.  High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence , 2008, 0811.3628.

[25]  Seungyeop Han,et al.  Structured Learning of Gaussian Graphical Models , 2012, NIPS.

[26]  Alexandr Andoni,et al.  Learning Polynomials with Neural Networks , 2014, ICML.

[27]  Jean-Baptiste Poline,et al.  Brain covariance selection: better individual functional connectivity models using population prior , 2010, NIPS.

[28]  Alex Lenkoski,et al.  A direct sampler for G‐Wishart variates , 2013, 1304.1350.

[29]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[30]  T. Cai,et al.  A Constrained ℓ1 Minimization Approach to Sparse Precision Matrix Estimation , 2011, 1102.2233.

[31]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[32]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[33]  Raquel Urtasun,et al.  Understanding the Effective Receptive Field in Deep Convolutional Neural Networks , 2016, NIPS.

[34]  Kaustubh Supekar,et al.  Estimation of functional connectivity in fMRI data using stability selection-based sparse partial correlation with elastic net penalty , 2012, NeuroImage.

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36]  Matthew B. Blaschko,et al.  Testing for Differences in Gaussian Graphical Models: Applications to Brain Connectivity , 2015, NIPS.

[37]  Mohammad Emtiyaz Khan,et al.  Accelerating Bayesian Structural Inference for Non-Decomposable Gaussian Graphical Models , 2009, NIPS.

[38]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.