论文信息 - Belief Propagation in Conditional RBMs for Structured Prediction

Belief Propagation in Conditional RBMs for Structured Prediction

Restricted Boltzmann machines~(RBMs) and conditional RBMs~(CRBMs) are popular models for a wide range of applications. In previous work, learning on such models has been dominated by contrastive divergence~(CD) and its variants. Belief propagation~(BP) algorithms are believed to be slow for structured prediction on conditional RBMs~(e.g., Mnih et al. [2011]), and not as good as CD when applied in learning~(e.g., Larochelle et al. [2012]). In this work, we present a matrix-based implementation of belief propagation algorithms on CRBMs, which is easily scalable to tens of thousands of visible and hidden units. We demonstrate that, in both maximum likelihood and max-margin learning, training conditional RBMs with BP as the inference routine can provide significantly better results than current state-of-the-art CD methods on structured prediction problems. We also include practical guidelines on training CRBMs with BP, and some insights on the interaction of learning and inference algorithms for CRBMs.

Wei Ping | Alexander T. Ihler

[1] John W. Fisher,et al. Loopy Belief Propagation: Convergence and Effects of Message Errors , 2005, J. Mach. Learn. Res..

[2] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[3] Wei Ping,et al. Marginal Structured SVM with Hidden Variables , 2014, ICML.

[4] Wei Ping,et al. Decomposition Bounds for Marginal MAP , 2015, NIPS.

[5] Qiang Liu,et al. Variational algorithms for marginal MAP , 2011, J. Mach. Learn. Res..

[6] Ming-Hsuan Yang,et al. Max-Margin Boltzmann Machines for Object Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[8] Thorsten Joachims,et al. Learning structural SVMs with latent variables , 2009, ICML '09.

[9] Geoffrey E. Hinton,et al. A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[10] Vladlen Koltun,et al. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[11] Joseph Gonzalez,et al. Residual Splash for Optimally Parallelizing Belief Propagation , 2009, AISTATS.

[12] Stefano Ermon,et al. Importance Sampling over Sets: A New Probabilistic Inference Scheme , 2015, UAI.

[13] Nando de Freitas,et al. Inductive Principles for Restricted Boltzmann Machine Learning , 2010, AISTATS.

[14] Kazuyuki Tanaka,et al. Approximate Learning Algorithm in Boltzmann Machines , 2009, Neural Computation.

[15] Hilbert J. Kappen,et al. On the properties of the Bethe approximation and loopy belief propagation on binary networks , 2004 .

[16] Yoshua Bengio,et al. Classification using discriminative restricted Boltzmann machines , 2008, ICML '08.

[17] Brendan J. Frey,et al. Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[18] Razvan Pascanu,et al. Autotagging music with conditional restricted Boltzmann machines , 2011, ArXiv.

[19] Tijmen Tieleman,et al. Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[20] Carsten Peterson,et al. A Mean Field Theory Learning Algorithm for Neural Networks , 1987, Complex Syst..

[21] Geoffrey E. Hinton,et al. Deep Boltzmann Machines , 2009, AISTATS.

[22] Michael I. Jordan,et al. Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[23] Razvan Pascanu,et al. Learning Algorithms for the Classification Restricted Boltzmann Machine , 2012, J. Mach. Learn. Res..

[24] Geoffrey E. Hinton,et al. Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[25] Geoffrey E. Hinton,et al. Modeling Human Motion Using Binary Latent Variables , 2006, NIPS.

[26] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[27] Trevor Darrell,et al. Hidden Conditional Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28] Xin Li,et al. Conditional Restricted Boltzmann Machines for Multi-label Learning with Incomplete Labels , 2015, AISTATS.

[29] Geoffrey E. Hinton. A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[30] Joris M. Mooij,et al. libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models , 2010, J. Mach. Learn. Res..

[31] Martin J. Wainwright,et al. Tree-based reparameterization framework for analysis of sum-product and related algorithms , 2003, IEEE Trans. Inf. Theory.

[32] Yee Whye Teh,et al. Approximate inference in Boltzmann machines , 2003, Artif. Intell..

[33] Ilya Sutskever,et al. On the Convergence Properties of Contrastive Divergence , 2010, AISTATS.

[34] Geoffrey E. Hinton,et al. Conditional Restricted Boltzmann Machines for Structured Output Prediction , 2011, UAI.