High-Dimensional Gaussian Graphical Model Selection: Tractable Graph Families

We consider the problem of high-dimensional Gaussian graphical model selection. We identify a set of graphs for which an efficient estimation algorithm exists, and this algorithm is based on thresholding of empirical conditional covariances. Under a set of transparent conditions, we establish structural consistency (or sparsistency) for the proposed algorithm, when the number of samples n = ω(J min log p), where p is the number of variables and Jmin is the minimum (absolute) edge potential of the graphical model. The sufficient conditions for sparsistency are based on the notion of walk-summability of the model and the presence of sparse local vertex separators in the underlying graph. We also derive novel non-asymptotic necessary conditions on the number of samples required for sparsistency.

[1]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[2]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[3]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[4]  Christopher Meek,et al.  Learning Bayesian Networks with Discrete Variables from Data , 1995, KDD.

[5]  Jung-Fu Cheng,et al.  Turbo Decoding as an Instance of Pearl's "Belief Propagation" Algorithm , 1998, IEEE J. Sel. Areas Commun..

[6]  David A. Bell,et al.  Learning Bayesian networks from data: An information-theory based approach , 2002, Artif. Intell..

[7]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[8]  Michael I. Jordan Graphical Models , 1998 .

[9]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[10]  Yair Weiss,et al.  Correctness of Local Probability Propagation in Graphical Models with Loops , 2000, Neural Computation.

[11]  William T. Freeman,et al.  Correctness of Belief Propagation in Gaussian Graphical Models of Arbitrary Topology , 1999, Neural Computation.

[12]  David R. Karger,et al.  Learning Markov networks: maximum bounded tree-width graphs , 2001, SODA '01.

[13]  Vijay V. Vazirani,et al.  Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[14]  Benny Sudakov,et al.  The Largest Eigenvalue of Sparse Random Graphs , 2001, Combinatorics, Probability and Computing.

[15]  Devavrat Shah,et al.  Maximum weight matching via max-product belief propagation , 2005, Proceedings. International Symposium on Information Theory, 2005. ISIT 2005..

[16]  Leonhard Held,et al.  Gaussian Markov Random Fields: Theory and Applications , 2005 .

[17]  A. Grabowski,et al.  Ising-based model of opinion formation in a complex network of interpersonal interactions , 2006 .

[18]  Jianhua Z. Huang,et al.  Covariance matrix selection and estimation via penalised normal likelihood , 2006 .

[19]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[20]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[21]  F. Chung,et al.  Complex Graphs and Networks , 2006 .

[22]  Dmitry M. Malioutov,et al.  Walk-Sums and Belief Propagation in Gaussian Graphical Models , 2006, J. Mach. Learn. Res..

[23]  Pieter Abbeel,et al.  Learning Factor Graphs in Polynomial Time and Sample Complexity , 2006, J. Mach. Learn. Res..

[24]  A. Grabowskia,et al.  Ising-based model of opinion formation in a complex network of interpersonal interactions , 2006 .

[25]  Hilbert J. Kappen,et al.  Sufficient Conditions for Convergence of the Sum–Product Algorithm , 2005, IEEE Transactions on Information Theory.

[26]  Elchanan Mossel,et al.  The Complexity of Distinguishing Markov Random Fields , 2008, APPROX-RANDOM.

[27]  M. Bayati,et al.  Max-Product for Maximum Weight Matching: Convergence, Correctness, and LP Duality , 2008, IEEE Transactions on Information Theory.

[28]  Bin Yu,et al.  High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence , 2008, 0811.3628.

[29]  Alexandre d'Aspremont,et al.  First-Order Methods for Sparse Covariance Selection , 2006, SIAM J. Matrix Anal. Appl..

[30]  Elchanan Mossel,et al.  Reconstruction of Markov Random Fields from Samples: Some Observations and Algorithms , 2007, SIAM J. Comput..

[31]  Venkat Chandrasekaran,et al.  Estimation in Gaussian Graphical Models Using Tractable Subgraphs: A Walk-Sum Analysis , 2008, IEEE Transactions on Signal Processing.

[32]  Young-Han Kim,et al.  State Amplification , 2008, IEEE Transactions on Information Theory.

[33]  Adam J. Rothman,et al.  Sparse permutation invariant covariance estimation , 2008, 0801.4837.

[34]  N. Ruozzi,et al.  Graph covers and quadratic minimization , 2009, 2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[35]  Lang Tong,et al.  A large-deviation analysis for the maximum likelihood learning of tree structures , 2009, 2009 IEEE International Symposium on Information Theory.

[36]  Devavrat Shah,et al.  Message Passing for Maximum Weight Independent Set , 2008, IEEE Transactions on Information Theory.

[37]  Jianqing Fan,et al.  Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation. , 2007, Annals of statistics.

[38]  Riccardo Zecchina,et al.  A rigorous analysis of the cavity equations for the minimum spanning tree , 2009, ArXiv.

[39]  P. Bickel,et al.  Covariance regularization by thresholding , 2009, 0901.3079.

[40]  Devavrat Shah,et al.  Belief Propagation for Min-Cost Network Flow: Convergence and Correctness , 2010, Oper. Res..

[41]  Antonio Torralba,et al.  Exploiting hierarchical context on a large database of object categories , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[42]  Larry Wasserman,et al.  Forest Density Estimation , 2010, J. Mach. Learn. Res..

[43]  Martin J. Wainwright,et al.  Information-theoretic bounds on model selection for Gaussian Markov random fields , 2010, 2010 IEEE International Symposium on Information Theory.

[44]  Benjamin Van Roy,et al.  Convergence of Min-Sum Message-Passing for Convex Optimization , 2010, IEEE Transactions on Information Theory.

[45]  Vincent Y. F. Tan,et al.  Learning Gaussian Tree Models: Analysis of Error Exponents and Extremal Structures , 2009, IEEE Transactions on Signal Processing.

[46]  Sanjay Shakkottai,et al.  Greedy learning of Markov network structure , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[47]  A. Dembo,et al.  Ising models on locally tree-like graphs , 2008, 0804.4726.

[48]  R. Hofstad,et al.  Ising Models on Power-Law Random Graphs , 2010, 1005.4556.

[49]  Venkat Chandrasekaran,et al.  Feedback message passing for inference in gaussian graphical models , 2010, 2010 IEEE International Symposium on Information Theory.

[50]  Convergent and Correct Message Passing Schemes for Optimization Problems over Graphical Models , 2010, UAI.

[51]  Vincent Y. F. Tan,et al.  Learning High-Dimensional Markov Forest Distributions: Analysis of Error Rates , 2010, J. Mach. Learn. Res..

[52]  Avinatan Hassidim,et al.  Topology discovery of sparse random graphs with few participants , 2011, SIGMETRICS '11.

[53]  Vincent Y. F. Tan,et al.  High-Dimensional Structure Estimation in Ising Models: Tractable Graph Families , 2011, ArXiv.

[54]  Vincent Y. F. Tan,et al.  Learning Latent Tree Graphical Models , 2010, J. Mach. Learn. Res..

[55]  Ioana Dumitriu,et al.  Sparse regular random graphs: Spectral density and eigenvectors , 2009, 0910.5306.

[56]  Y F TanVincent,et al.  High-dimensional Gaussian graphical model selection , 2012 .

[57]  Pascal O. Vontobel,et al.  Counting in Graph Covers: A Combinatorial Characterization of the Bethe Entropy Function , 2010, IEEE Transactions on Information Theory.

[58]  U. Feige,et al.  Spectral Graph Theory , 2015 .