Because maximum-likelihood training is intractable for general factor graphs, an appealing alternative is local training, which approximates the likelihood gradient without performing global propagation on the graph. We discuss two new local training methods: shared-unary piecewise, in which unary factors are shared among every higher-way factor that they neighbor, and the one-step cutout method, which computes exact marginals on overlapping subgraphs. Comparing them to naive piecewise training, we show that just as piecewise training corresponds to using the Bethe pseudomarginals after zero BP iterations, shared-unary piecewise corresponds to the pseudomarginals after one parallel iteration, and the one-step cutout method corresponds to the beliefs after two iterations. We show in simulations that this point of view illuminates the errors made by shared-unary piecewise.
[1]
J. Besag.
Spatial Interaction and the Statistical Analysis of Lattice Systems
,
1974
.
[2]
J. Besag.
Statistical Analysis of Non-Lattice Data
,
1975
.
[3]
William T. Freeman,et al.
Constructing free-energy approximations and generalized belief propagation algorithms
,
2005,
IEEE Transactions on Information Theory.
[4]
Thomas P. Minka,et al.
Divergence measures and message passing
,
2005
.
[5]
Andrew McCallum,et al.
Piecewise Training for Undirected Models
,
2005,
UAI.
[6]
Andrew McCallum,et al.
An Introduction to Conditional Random Fields for Relational Learning
,
2007
.