On the Use of Variational Inference for Learning Discrete Graphical Model

We study the general class of estimators for graphical model structure based on optimizing l1-regularized approximate log-likelihood, where the approximate likelihood uses tractable variational approximations of the partition function. We provide a message-passing algorithm that directly computes the l1 regularized approximate MLE. Further, in the case of certain reweighted entropy approximations to the partition function, we show that surprisingly the l1 regularized approximate MLE estimator has a closed-form, so that we would no longer need to run through many iterations of approximate inference and message-passing. Lastly, we analyze this general class of estimators for graph structure recovery, or its sparsistency, and show that it is indeed sparsistent under certain conditions.

[1]  E. Ising Beitrag zur Theorie des Ferromagnetismus , 1925 .

[2]  Andrzej Stachurski,et al.  Parallel Optimization: Theory, Algorithms and Applications , 2000, Parallel Distributed Comput. Pract..

[3]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[4]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[5]  Martin S. Kochmanski NOTE ON THE E. ISING'S PAPER ,,BEITRAG ZUR THEORIE DES FERROMAGNETISMUS" (Zs. Physik, 31, 253 (1925)) , 2008 .

[6]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[7]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[8]  Andrea Montanari,et al.  Which graphical models are difficult to learn? , 2009, NIPS.

[9]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[10]  Y. Censor,et al.  Parallel Optimization: Theory, Algorithms, and Applications , 1997 .

[11]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[12]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[13]  Martin J. Wainwright,et al.  A new class of upper bounds on the log partition function , 2002, IEEE Transactions on Information Theory.

[14]  M. Wainwright,et al.  HIGH-DIMENSIONAL COVARIANCE ESTIMATION BY MINIMIZING l1-PENALIZED LOG-DETERMINANT DIVERGENCE BY PRADEEP RAVIKUMAR , 2009 .

[15]  James M. Ortega,et al.  Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.

[16]  Joel A. Tropp,et al.  Just relax: convex programming methods for identifying sparse signals in noise , 2006, IEEE Transactions on Information Theory.

[17]  Martin J. Wainwright,et al.  Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting , 2006, J. Mach. Learn. Res..

[18]  Tom Heskes,et al.  Convexity Arguments for Efficient Minimization of the Bethe and Kikuchi Free Energies , 2006, J. Artif. Intell. Res..

[19]  W. Freeman,et al.  Generalized Belief Propagation , 2000, NIPS.

[20]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[21]  Yair Weiss,et al.  MAP Estimation, Linear Programming and Belief Propagation with Convex Free Energies , 2007, UAI.

[22]  Daphne Koller,et al.  Efficient Structure Learning of Markov Networks using L1-Regularization , 2006, NIPS.

[23]  Martin J. Wainwright,et al.  Tree-reweighted belief propagation algorithms and approximate ML estimation by pseudo-moment matching , 2003, AISTATS.