论文信息 - Deep Directed Generative Autoencoders

Deep Directed Generative Autoencoders

For discrete data, the likelihood $P(x)$ can be rewritten exactly and parametrized into $P(X = x) = P(X = x | H = f(x)) P(H = f(x))$ if $P(X | H)$ has enough capacity to put no probability mass on any $x'$ for which $f(x')\neq f(x)$, where $f(\cdot)$ is a deterministic discrete function. The log of the first factor gives rise to the log-likelihood reconstruction error of an autoencoder with $f(\cdot)$ as the encoder and $P(X|H)$ as the (probabilistic) decoder. The log of the second term can be seen as a regularizer on the encoded activations $h=f(x)$, e.g., as in sparse autoencoders. Both encoder and decoder can be represented by a deep neural network and trained to maximize the average of the optimal log-likelihood $\log p(x)$. The objective is to learn an encoder $f(\cdot)$ that maps $X$ to $f(X)$ that has a much simpler distribution than $X$ itself, estimated by $P(H)$. This "flattens the manifold" or concentrates probability mass in a smaller number of (relevant) dimensions over which the distribution factorizes. Generating samples from the model is straightforward using ancestral sampling. One challenge is that regular back-propagation cannot be used to obtain the gradient on the parameters of the encoder, but we find that using the straight-through estimator works well here. We also find that although optimizing a single level of such architecture may be difficult, much better results can be obtained by pre-training and stacking them, gradually transforming the data distribution into one that is more easily captured by a simple parametric model.

Yoshua Bengio | Sherjil Ozair

[1] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.

[2] Karol Gregor,et al. Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[3] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[4] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[5] Hugo Larochelle,et al. A Deep and Tractable Density Estimator , 2013, ICML.

[6] Razvan Pascanu,et al. Theano: new features and speed improvements , 2012, ArXiv.

[7] Geoffrey E. Hinton. Connectionist Learning Procedures , 1989, Artif. Intell..

[8] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[9] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[10] Yoshua Bengio,et al. How Auto-Encoders Could Provide Credit Assignment in Deep Networks via Target Propagation , 2014, ArXiv.

[11] Yoshua Bengio,et al. Better Mixing via Deep Representations , 2012, ICML.