Variational Walkback: Learning a Transition Operator as a Stochastic Recurrent Net

We propose a novel method to directly learn a stochastic transition operator whose repeated application provides generated samples. Traditional undirected graphical models approach this problem indirectly by learning a Markov chain model whose stationary distribution obeys detailed balance with respect to a parameterized energy function. The energy function is then modified so the model and data distributions match, with no guarantee on the number of steps required for the Markov chain to converge. Moreover, the detailed balance condition is highly restrictive: energy based models corresponding to neural networks must have symmetric weights, unlike biological neural circuits. In contrast, we develop a method for directly learning arbitrarily parameterized transition operators capable of expressing non-equilibrium stationary distributions that violate detailed balance, thereby enabling us to learn more biologically plausible asymmetric neural networks and more general non-energy based dynamical systems. The proposed training objective, which we derive via principled variational methods, encourages the transition operator to "walk back" in multi-step trajectories that start at data-points, as quickly as possible back to the original data points. We present a series of experimental results illustrating the soundness of the proposed approach, Variational Walkback (VW), on the MNIST, CIFAR-10, SVHN and CelebA datasets, demonstrating superior samples compared to earlier attempts to learn a transition operator. We also show that although each rapid training trajectory is limited to a finite but variable number of steps, our transition operator continues to generate good samples well past the length of such trajectories, thereby demonstrating the match of its non-equilibrium stationary distribution to the data distribution. Source Code: this http URL

[1]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[2]  G. Crooks Path-ensemble averages in systems driven far from equilibrium , 1999, cond-mat/9908420.

[3]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[4]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[5]  U. Seifert,et al.  Optimal finite-time processes in stochastic thermodynamics. , 2007, Physical review letters.

[6]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[7]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[8]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[9]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[10]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[11]  David A. Sivak,et al.  Thermodynamic metrics and optimal paths. , 2012, Physical review letters.

[12]  Yoshua Bengio,et al.  Better Mixing via Deep Representations , 2012, ICML.

[13]  Pascal Vincent,et al.  Generalized Denoising Auto-Encoders as Generative Models , 2013, NIPS.

[14]  Yoshua Bengio,et al.  What regularized auto-encoders learn from the data-generating distribution , 2012, J. Mach. Learn. Res..

[15]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[16]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[17]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[18]  Yoshua Bengio,et al.  Deep Generative Stochastic Networks Trainable by Backprop , 2013, ICML.

[19]  Daniel Cownden,et al.  Random feedback weights support learning in deep neural networks , 2014, ArXiv.

[20]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[21]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[22]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[23]  Sanjeev Arora,et al.  Why are deep nets reversible: A simple theory, with implications for training , 2015, ArXiv.

[24]  Ruslan Salakhutdinov,et al.  Accurate and conservative estimates of MRF log-likelihood using reverse annealing , 2014, AISTATS.

[25]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[26]  Yoshua Bengio,et al.  An objective function for STDP , 2015, ArXiv.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Matthias Bethge,et al.  A note on the evaluation of generative models , 2015, ICLR.

[29]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[30]  Ole Winther,et al.  Ladder Variational Autoencoders , 2016, NIPS.

[31]  Grant M. Rotskoff,et al.  Near-optimal protocols in complex nonequilibrium transformations , 2016, Proceedings of the National Academy of Sciences.

[32]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[33]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[34]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[35]  Sina Honari,et al.  Learning to Generate Samples from Noise through Infusion Training , 2017, ICLR.