Empirical Comparison of Greedy Strategies for Learning Markov Networks of Treewidth k

We recently proposed the Edgewise Greedy Algorithm (EGA) for learning a decomposable Markov network of treewidth k approximating a given joint probability distribution of n discrete random variables. The main ingredient of our algorithm is the stepwise forward selection algorithm (FSA) due to Deshpande, Garofalakis, and Jordan. EGA is an efficient alternative to the algorithm (HGA) by Malvestuto, which constructs a model of treewidth k by selecting hyperedges of order k+1. In this paper, we present results of empirical studies that compare HGA, EGA and FSA-K which is a straightforward application of FSA, in terms of approximation accuracy (measured by KL-divergence) and computational time. Our experiments show that (1) on the average, all three algorithms produce similar approximation accuracy; (2) EGA produces comparable or better approximation accuracy and is the most efficient among the three. (3) Malvestuto's algorithm is the least efficient one, although it tends to produce better accuracy when the treewidth is bigger than half of the number of random variabls; (4) EGA coupled with local search has the best approximation accuracy overall, at a cost of increased computation time by 50 percent.

[1]  Francesco M. Malvestuto,et al.  Approximating discrete probability distributions with decomposable models , 1991, IEEE Trans. Syst. Man Cybern..

[2]  Nathan Srebro,et al.  Methods and Experiments With Bounded Tree-width Markov Networks , 2004 .

[3]  Michael I. Jordan,et al.  Efficient Stepwise Selection in Decomposable Models , 2001, UAI.

[4]  David R. Karger,et al.  Learning Markov networks: maximum bounded tree-width graphs , 2001, SODA '01.

[5]  Blair J R S,et al.  Introduction to Chordal Graphs and Clique Trees, in Graph Theory and Sparse Matrix Computation , 1997 .

[6]  Jeff A. Bilmes,et al.  PAC-learning Bounded Tree-width Graphical Models , 2004, UAI.

[7]  Yang Xiang,et al.  A “Microscopic” Study of Minimum Entropy Search in Learning Decomposable Markov Networks , 2004, Machine Learning.

[8]  Hans L. Bodlaender,et al.  A Partial k-Arboretum of Graphs with Bounded Treewidth , 1998, Theor. Comput. Sci..

[9]  Michael I. Jordan Graphical Models , 1998 .

[10]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[11]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[12]  Michael I. Jordan,et al.  Thin Junction Trees , 2001, NIPS.

[13]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[14]  Robert M. Haralick,et al.  Approximating high dimensional probability distributions , 2004, ICPR 2004.

[15]  M. J. Norušis,et al.  SPSS 14.0 Advanced Statistical Procedures Companion , 2005 .

[16]  Michel Habib,et al.  Chordal Graphs and Their Clique Graphs , 1995, WG.

[17]  B. Peyton,et al.  An Introduction to Chordal Graphs and Clique Trees , 1993 .

[18]  Donald J. ROSE,et al.  On simple characterizations of k-trees , 1974, Discret. Math..

[19]  Robert M. Haralick,et al.  Practical aspects of efficient forward selection in decomposable graphical models , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[20]  P. Green,et al.  Decomposable graphical Gaussian model determination , 1999 .