GSNs : Generative Stochastic Networks

We introduce a novel training principle for generative probabilistic models that is an alternative to maximum likelihood. The proposed Generative Stochastic Networks (GSN) framework generalizes Denoising Auto-Encoders (DAE) and is based on learning the transition operator of a Markov chain whose stationary distribution estimates the data distribution. The transition distribution is a conditional distribution that generally involves a small move, so it has fewer dominant modes and is unimodal in the limit of small moves. This simplies the learning problem, making it less like density estimation and more akin to supervised function approximation, with gradients that can be obtained by backprop. The theorems provided here provide a probabilistic interpretation for denoising autoencoders and generalize them; seen in the context of this framework, auto-encoders that learn with injected noise are a special case of GSNs and can be interpreted as generative models. The theorems also provide an interesting justication for dependency networks and generalized pseudolikelihood and dene an appropriate joint distribution and sampling mechanism, even when the conditionals are not consistent. GSNs can be used with missing inputs and can be used to sample subsets of variables given the rest. Experiments validating these theoretical results are conducted on both synthetic datasets and image datasets. The experiments employ a particular architecture that mimics the Deep Boltzmann Machine Gibbs sampler but that allows training to proceed with backprop through a recurrent neural network with noise injected inside and without the need for layerwise pretraining.

[1]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[2]  Daan Wierstra,et al.  Deep AutoRegressive Networks , 2013, ICML.

[3]  C. D. Meyer,et al.  Comparison of perturbation bounds for the stationary distribution of a Markov chain , 2001 .

[4]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[5]  F. Savard Réseaux de neurones à relaxation entraînés par critère d'autoencodeur débruitant , 2012 .

[6]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[7]  Yoshua Bengio,et al.  Deep Generative Stochastic Networks Trainable by Backprop , 2013, ICML.

[8]  Pascal Vincent,et al.  Quickly Generating Representative Samples from an RBM-Derived Process , 2011, Neural Computation.

[9]  Li Yao,et al.  On the Equivalence between Deep NADE and Generative Stochastic Networks , 2014, ECML/PKDD.

[10]  Max Welling Donald,et al.  Products of Experts , 2007 .

[11]  H. Sebastian Seung,et al.  Learning Continuous Attractors in Recurrent Networks , 1997, NIPS.

[12]  Aapo Hyvärinen,et al.  Consistency of Pseudolikelihood Estimation of Fully Visible Boltzmann Machines , 2006, Neural Computation.

[13]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[14]  P. Schweitzer Perturbation theory and finite Markov chains , 1968 .

[15]  Li Yao,et al.  Multimodal Transitions for Generative Stochastic Networks , 2013, ICLR.

[16]  Pascal Vincent,et al.  Generalized Denoising Auto-Encoders as Generative Models , 2013, NIPS.

[17]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[18]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[19]  Yoshua Bengio,et al.  What regularized auto-encoders learn from the data-generating distribution , 2012, J. Mach. Learn. Res..

[20]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[21]  Franz Pernkopf,et al.  General Stochastic Networks for Classification , 2014, NIPS.

[22]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[23]  Karol Gregor,et al.  Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[24]  Lorenzo Rosasco,et al.  On Invariance and Selectivity in Representation Learning , 2015, ArXiv.

[25]  Xu Chen,et al.  Deep Haar Scattering Networks , 2015, ArXiv.

[26]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[27]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[28]  Jian Zhou,et al.  Deep Supervised and Convolutional Generative Stochastic Network for Protein Secondary Structure Prediction , 2014, ICML.

[29]  A Allakhverdov,et al.  Russia readies its first gene law. , 1995, Science.

[30]  Richard S. Zemel,et al.  Exploring Compositional High Order Pattern Potentials for Structured Output Learning , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[32]  Yoshua Bengio,et al.  Texture Modeling with Convolutional Spike-and-Slab RBMs and Deep Extensions , 2012, AISTATS.

[33]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[34]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[35]  Sven Behnke,et al.  Learning Iterative Image Reconstruction in the Neural Abstraction Pyramid , 2001, Int. J. Comput. Intell. Appl..

[36]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[37]  Yoshua Bengio,et al.  Multi-Prediction Deep Boltzmann Machines , 2013, NIPS.

[38]  Yoshua Bengio,et al.  Deep Directed Generative Autoencoders , 2014, ArXiv.

[39]  L. Younes On the convergence of markovian stochastic algorithms with rapidly decreasing ergodicity rates , 1999 .

[40]  Tapani Raiko,et al.  Enhanced Gradient for Training Restricted Boltzmann Machines , 2013, Neural Computation.

[41]  Li Yao,et al.  Bounding the Test Log-Likelihood of Generative Models , 2014, ICLR.

[42]  Jason Weston,et al.  A semantic matching energy function for learning with multi-relational data , 2013, Machine Learning.

[43]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[44]  Dong Yu,et al.  Conversational Speech Transcription Using Context-Dependent Deep Neural Networks , 2012, ICML.

[45]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[47]  Geoffrey E. Hinton,et al.  Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine , 2010, NIPS.

[48]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[49]  Yoshua Bengio,et al.  Reweighted Wake-Sleep , 2014, ICLR.

[50]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[51]  Klaus-Robert Müller,et al.  Deep Boltzmann Machines and the Centering Trick , 2012, Neural Networks: Tricks of the Trade.

[52]  Yoshua Bengio,et al.  Better Mixing via Deep Representations , 2012, ICML.

[53]  Diederik P. Kingma Fast Gradient-Based Inference with Continuous Latent Variable Models in Auxiliary Form , 2013, ArXiv.

[54]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[55]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[56]  Geoffrey E. Hinton,et al.  Binary coding of speech spectrograms using a deep auto-encoder , 2010, INTERSPEECH.

[57]  Hugo Larochelle,et al.  The Neural Autoregressive Distribution Estimator , 2011, AISTATS.

[58]  Thomas Hofmann,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2007 .

[59]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[60]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[61]  Pedro M. Domingos,et al.  Sum-product networks: A new deep architecture , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[62]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons , 2013, ArXiv.

[63]  Yoshua Bengio,et al.  A Generative Process for sampling Contractive Auto-Encoders , 2012, ICML 2012.