On Learning Discrete Graphical Models using Group-Sparse Regularization

We study the problem of learning the graph structure associated with a general discrete graphical models (each variable can take any of m > 1 values, the clique factors have maximum size c ≥ 2) from samples, under high-dimensional scaling where the number of variables p could be larger than the number of samples n. We provide a quantitative consistency analysis of a procedure based on node-wise multi-class logistic regression with group-sparse regularization. We first consider general m-ary pairwise models – where each factor depends on at most two variables. We show that when the number of samples scale as n > K(m − 1) 2 d 2 log((m −1) 2 (p −1))– where d is the maximum degree and K a fixed constant – the procedure succeeds in recovering the graph with high probability. For general models with c-way factors, the natural multi-way extension of the pairwise method quickly becomes very computationally complex. So we studied the effectiveness of using the pairwise method even while the true model has higher order factors. Surprisingly, we show that under slightly more stringent conditions, the pairwise procedure still recovers the graph structure, when the samples scale as n > K(m − 1) 2 d 3 2 c 1 log((m − 1) c (p − 1) c 1 ).

[1]  Daphne Koller,et al.  Efficient Structure Learning of Markov Networks using L1-Regularization , 2006, NIPS.

[2]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[3]  Michael I. Jordan,et al.  Union support recovery in high-dimensional multivariate regression , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[4]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[5]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[6]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[7]  Adam J. Rothman,et al.  Sparse permutation invariant covariance estimation , 2008, 0801.4837.

[8]  Michael I. Jordan Graphical Models , 2003 .

[9]  J. Lafferty,et al.  Sparse additive models , 2007, 0711.4555.

[10]  J. Woods Markov image modeling , 1976, 1976 IEEE Conference on Decision and Control including the 15th Symposium on Adaptive Processes.

[11]  M. Wainwright,et al.  Joint support recovery under high-dimensional scaling: Benefits and perils of ℓ 1,∞ -regularization , 2008, NIPS 2008.

[12]  Thomas Hofmann,et al.  Efficient Structure Learning of Markov Networks using L1-Regularization , 2007 .

[13]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[14]  M. Hassner,et al.  The use of Markov Random Fields as models of texture , 1980 .

[15]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[16]  Sanjoy Dasgupta,et al.  Learning Polytrees , 1999, UAI.

[17]  D. Donoho,et al.  Maximal Sparsity Representation via l 1 Minimization , 2002 .

[18]  Martin S. Kochmanski NOTE ON THE E. ISING'S PAPER ,,BEITRAG ZUR THEORIE DES FERROMAGNETISMUS" (Zs. Physik, 31, 253 (1925)) , 2008 .

[19]  Nathan Srebro,et al.  Maximum likelihood bounded tree-width Markov networks , 2001, Artif. Intell..

[20]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[21]  Zhe Jiang,et al.  Spatial Statistics , 2013 .

[22]  Anil K. Jain,et al.  Markov Random Field Texture Models , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Pieter Abbeel,et al.  Learning Factor Graphs in Polynomial Time and Sample Complexity , 2006, J. Mach. Learn. Res..

[24]  P. Bühlmann,et al.  Decomposition and Model Selection for Large Contingency Tables , 2009, Biometrical journal. Biometrische Zeitschrift.

[25]  Massimiliano Pontil,et al.  Taking Advantage of Sparsity in Multi-Task Learning , 2009, COLT.

[26]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[27]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[28]  Joel A. Tropp,et al.  Just relax: convex programming methods for identifying sparse signals in noise , 2006, IEEE Transactions on Information Theory.

[29]  Lawrence Carin,et al.  Sparse multinomial logistic regression: fast algorithms and generalization bounds , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Imre Csisz'ar,et al.  Consistent estimation of the basic neighborhood of Markov random fields , 2006, math/0605323.

[31]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[32]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[33]  Elchanan Mossel,et al.  Reconstruction of Markov Random Fields from Samples: Some Observations and Algorithms , 2007, SIAM J. Comput..

[34]  Stephen J. Wright,et al.  Simultaneous Variable Selection , 2005, Technometrics.

[35]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[36]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[37]  Peter Bühlmann,et al.  Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries , 2007, BMC Bioinformatics.