Training Bidirectional Helmholtz Machines

Efficient unsupervised training and inference in deep generative models remains a challenging problem. One basic approach, called Helmholtz machine, involves training a top-down directed generative model together with a bottom-up auxiliary model that is trained to help perform approximate inference. Recent results indicate that better results can be obtained with better approximate inference procedures. Instead of employing more powerful procedures, we here propose to regularize the generative model to stay close to the class of distributions that can be efficiently inverted by the approximate inference model. We achieve this by interpreting both the top-down and the bottom-up directed models as approximate inference distributions and by defining the model distribution to be the geometric mean of these two. We present a lower-bound for the likelihood of this model and we show that optimizing this bound regularizes the model so that the Bhattacharyya distance between the bottom-up and top-down approximate distributions is minimized. We demonstrate that we can use this approach to fit generative models with many layers of hidden binary stochastic variables to complex training distributions and hat this method prefers significantly deeper architectures while it supports orders of magnitude more efficient approximate inference than other approaches.

[1]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[2]  Christian P. Robert,et al.  Introducing Monte Carlo Methods with R , 2009 .

[3]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[4]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[5]  Ruslan Salakhutdinov,et al.  Evaluating probabilities under high-dimensional latent variable models , 2008, NIPS.

[6]  Geoffrey E. Hinton,et al.  Varieties of Helmholtz Machine , 1996, Neural Networks.

[7]  Hugo Larochelle,et al.  A Deep and Tractable Density Estimator , 2013, ICML.

[8]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[9]  Hugo Larochelle,et al.  The Neural Autoregressive Distribution Estimator , 2011, AISTATS.

[10]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[11]  Yoshua Bengio,et al.  Reweighted Wake-Sleep , 2014, ICLR.

[12]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[13]  Daan Wierstra,et al.  Deep AutoRegressive Networks , 2013, ICML.

[14]  Yoshua Bengio,et al.  Blocks and Fuel: Frameworks for deep learning , 2015, ArXiv.

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Karol Gregor,et al.  Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[17]  Tom Schaul,et al.  Unit Tests for Stochastic Optimization , 2013, ICLR.

[18]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[19]  Ruslan Salakhutdinov,et al.  Accurate and conservative estimates of MRF log-likelihood using reverse annealing , 2014, AISTATS.

[20]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[21]  Xinyun Chen Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .

[22]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[23]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[24]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.