EDML for Learning Parameters in Directed and Undirected Graphical Models

EDML is a recently proposed algorithm for learning parameters in Bayesian networks. It was originally derived in terms of approximate inference on a meta-network, which underlies the Bayesian approach to parameter estimation. While this initial derivation helped discover EDML in the first place and provided a concrete context for identifying some of its properties (e.g., in contrast to EM), the formal setting was somewhat tedious in the number of concepts it drew on. In this paper, we propose a greatly simplified perspective on EDML, which casts it as a general approach to continuous optimization. The new perspective has several advantages. First, it makes immediate some results that were non-trivial to prove initially. Second, it facilitates the design of EDML algorithms for new graphical models, leading to a new algorithm for learning parameters in Markov networks. We derive this algorithm in this paper, and show, empirically, that it can sometimes learn estimates more efficiently from complete data, compared to commonly used optimization methods, such as conjugate gradient and L-BFGS.

[1]  Adnan Darwiche,et al.  EDML: A Method for Learning Parameters in Bayesian Networks , 2011, UAI.

[2]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[3]  J. Besag Statistical Analysis of Non-Lattice Data , 1975 .

[4]  Adnan Darwiche,et al.  On the Revision of Probabilistic Beliefs using Uncertain Evidence , 2003, IJCAI.

[5]  Adnan Darwiche,et al.  Modeling and Reasoning with Bayesian Networks , 2009 .

[6]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  R. Jirousek,et al.  On the effective implementation of the iterative proportional fitting procedure , 1995 .

[8]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[9]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[10]  Adnan Darwiche,et al.  A differential approach to inference in Bayesian networks , 2000, JACM.

[11]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[12]  Adnan Darwiche,et al.  An Edge Deletion Semantics for Belief Propagation and its Practical Impact on Approximation Quality , 2006, AAAI.

[13]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[14]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[15]  Adnan Darwiche,et al.  A differential semantics for jointree algorithms , 2002, Artif. Intell..

[16]  S. Lauritzen The EM algorithm for graphical association models with missing data , 1995 .

[17]  Adnan Darwiche,et al.  New Advances and Theoretical Insights into EDML , 2012, UAI.

[18]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[19]  Pedro M. Domingos,et al.  Markov Logic: An Interface Layer for Artificial Intelligence , 2009, Markov Logic: An Interface Layer for Artificial Intelligence.