Belief Propagation in Conditional RBMs for Structured Prediction

Restricted Boltzmann machines~(RBMs) and conditional RBMs~(CRBMs) are popular models for a wide range of applications. In previous work, learning on such models has been dominated by contrastive divergence~(CD) and its variants. Belief propagation~(BP) algorithms are believed to be slow for structured prediction on conditional RBMs~(e.g., Mnih et al. [2011]), and not as good as CD when applied in learning~(e.g., Larochelle et al. [2012]). In this work, we present a matrix-based implementation of belief propagation algorithms on CRBMs, which is easily scalable to tens of thousands of visible and hidden units. We demonstrate that, in both maximum likelihood and max-margin learning, training conditional RBMs with BP as the inference routine can provide significantly better results than current state-of-the-art CD methods on structured prediction problems. We also include practical guidelines on training CRBMs with BP, and some insights on the interaction of learning and inference algorithms for CRBMs.

[1]  John W. Fisher,et al.  Loopy Belief Propagation: Convergence and Effects of Message Errors , 2005, J. Mach. Learn. Res..

[2]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[3]  Wei Ping,et al.  Marginal Structured SVM with Hidden Variables , 2014, ICML.

[4]  Wei Ping,et al.  Decomposition Bounds for Marginal MAP , 2015, NIPS.

[5]  Qiang Liu,et al.  Variational algorithms for marginal MAP , 2011, J. Mach. Learn. Res..

[6]  Ming-Hsuan Yang,et al.  Max-Margin Boltzmann Machines for Object Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[8]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[9]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[10]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[11]  Joseph Gonzalez,et al.  Residual Splash for Optimally Parallelizing Belief Propagation , 2009, AISTATS.

[12]  Stefano Ermon,et al.  Importance Sampling over Sets: A New Probabilistic Inference Scheme , 2015, UAI.

[13]  Nando de Freitas,et al.  Inductive Principles for Restricted Boltzmann Machine Learning , 2010, AISTATS.

[14]  Kazuyuki Tanaka,et al.  Approximate Learning Algorithm in Boltzmann Machines , 2009, Neural Computation.

[15]  Hilbert J. Kappen,et al.  On the properties of the Bethe approximation and loopy belief propagation on binary networks , 2004 .

[16]  Yoshua Bengio,et al.  Classification using discriminative restricted Boltzmann machines , 2008, ICML '08.

[17]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[18]  Razvan Pascanu,et al.  Autotagging music with conditional restricted Boltzmann machines , 2011, ArXiv.

[19]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[20]  Carsten Peterson,et al.  A Mean Field Theory Learning Algorithm for Neural Networks , 1987, Complex Syst..

[21]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[22]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[23]  Razvan Pascanu,et al.  Learning Algorithms for the Classification Restricted Boltzmann Machine , 2012, J. Mach. Learn. Res..

[24]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[25]  Geoffrey E. Hinton,et al.  Modeling Human Motion Using Binary Latent Variables , 2006, NIPS.

[26]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[27]  Trevor Darrell,et al.  Hidden Conditional Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Xin Li,et al.  Conditional Restricted Boltzmann Machines for Multi-label Learning with Incomplete Labels , 2015, AISTATS.

[29]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[30]  Joris M. Mooij,et al.  libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models , 2010, J. Mach. Learn. Res..

[31]  Martin J. Wainwright,et al.  Tree-based reparameterization framework for analysis of sum-product and related algorithms , 2003, IEEE Trans. Inf. Theory.

[32]  Yee Whye Teh,et al.  Approximate inference in Boltzmann machines , 2003, Artif. Intell..

[33]  Ilya Sutskever,et al.  On the Convergence Properties of Contrastive Divergence , 2010, AISTATS.

[34]  Geoffrey E. Hinton,et al.  Conditional Restricted Boltzmann Machines for Structured Output Prediction , 2011, UAI.