Learning Thin Junction Trees via Graph Cuts

Structure learning algorithms usually focus on the compactness of the learned model. However, for general compact models, both exact and approximate inference are still NP-hard. Therefore, the focus only on compactness leads to learning models that require approximate inference techniques, thus reducing their prediction quality. In this paper, we propose a method for learning an attractive class of models: bounded-treewidth junction trees, which permit both compact representation of probability distributions and efficient exact inference. Using Bethe approximation of the likelihood, we transform the problem of finding a good junction tree separator into a minimum cut problem on a weighted graph. Using the graph cut intuition, we present an efficient algorithm with theoretical guarantees for finding good separators, which we recursively apply to obtain a thin junction tree. Our extensive empirical evaluation demonstrates the benefit of applying exact inference using our models to answer queries. We also extend our technique to learning low tree-width conditional random fields, and demonstrate significant improvements over state of the art block-L1 regularization techniques.

[1]  W. Freeman,et al.  Bethe free energy, Kikuchi approximations, and belief propagation algorithms , 2001 .

[2]  Ben Taskar,et al.  Learning Probabilistic Models of Relational Structure , 2001, ICML.

[3]  Mark W. Schmidt,et al.  Structure learning in random fields for heart motion abnormality detection , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Carlos Guestrin,et al.  Efficient Principled Learning of Thin Junction Trees , 2007, NIPS.

[5]  Gregory F. Cooper,et al.  The ALARM Monitoring System: A Case Study with two Probabilistic Inference Techniques for Belief Networks , 1989, AIME.

[6]  Michael I. Jordan,et al.  Thin Junction Trees , 2001, NIPS.

[7]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[8]  David R. Karger,et al.  Learning Markov networks: maximum bounded tree-width graphs , 2001, SODA '01.

[9]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[10]  Daphne Koller,et al.  Ordering-Based Search: A Simple and Effective Algorithm for Learning Bayesian Networks , 2005, UAI.

[11]  Andreas Krause,et al.  Near-optimal Nonmyopic Value of Information in Graphical Models , 2005, UAI.

[12]  Pedro M. Domingos,et al.  Learning Arithmetic Circuits , 2008, UAI.

[13]  Wei Hong,et al.  Model-Driven Data Acquisition in Sensor Networks , 2004, VLDB.

[14]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.