论文信息 - Direct Optimization through arg max for Discrete Variational Auto-Encoder

Direct Optimization through arg max for Discrete Variational Auto-Encoder

Reparameterization of variational auto-encoders with continuous random variables is an effective method for reducing the variance of their gradient estimates. In the discrete case, one can perform reparametrization using the Gumbel-Max trick, but the resulting objective relies on an $\arg \max$ operation and is non-differentiable. In contrast to previous works which resort to softmax-based relaxations, we propose to optimize it directly by applying the direct loss minimization approach. Our proposal extends naturally to structured discrete latent variable models when evaluating the $\arg \max$ operation is tractable. We demonstrate empirically the effectiveness of the direct loss minimization technique in variational autoencoders with both unstructured and structured discrete latent variables.

[1] Roland Vollgraf,et al. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[2] Gerald B. Folland,et al. Real Analysis: Modern Techniques and Their Applications , 1984 .

[3] Colin Raffel,et al. Learning Hard Alignments with Variational Inference , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4] M. Titsias. Local Expectation Gradients for Doubly Stochastic Variational Inference , 2015, 1503.01494.

[5] David Duvenaud,et al. Sticking the Landing: Simple, Lower-Variance Gradient Estimators for Variational Inference , 2017, NIPS.

[6] Ruslan Salakhutdinov,et al. On the quantitative analysis of deep belief networks , 2008, ICML '08.

[7] Wang Ling,et al. Learning to Compose Words into Sentences with Reinforcement Learning , 2016, ICLR.

[8] Arash Vahdat,et al. Improved Gradient-Based Optimization Over Discrete Distributions , 2018, ArXiv.

[9] Arash Vahdat,et al. DVAE++: Discrete Variational Autoencoders with Overlapping Transformations , 2018, ICML.

[10] Joshua B. Tenenbaum,et al. Human-level concept learning through probabilistic program induction , 2015, Science.

[11] R. Duncan Luce,et al. Individual Choice Behavior , 1959 .

[12] L. Rabiner,et al. An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[13] Jason Tyler Rolfe,et al. Discrete Variational Autoencoders , 2016, ICLR.

[14] Sergey Levine,et al. MuProp: Unbiased Backpropagation for Stochastic Neural Networks , 2015, ICLR.

[15] Matt J. Kusner,et al. Grammar Variational Autoencoder , 2017, ICML.

[16] Graham Neubig,et al. StructVAE: Tree-structured Latent Variable Models for Semi-supervised Semantic Parsing , 2018, ACL.

[17] Michael I. Jordan,et al. Variational Bayesian Inference with Stochastic Search , 2012, ICML.

[18] Guoyin Wang,et al. NASH: Toward End-to-End Neural Architecture for Generative Semantic Hashing , 2018, ACL.

[19] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[20] Xiaogang Wang,et al. Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[21] Max Welling,et al. Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[22] Tom Minka,et al. A* Sampling , 2014, NIPS.

[23] David M. Blei,et al. Variational Inference: A Review for Statisticians , 2016, ArXiv.

[24] Geoffrey E. Hinton,et al. Attend, Infer, Repeat: Fast Scene Understanding with Generative Models , 2016, NIPS.

[25] D. McFadden. Conditional logit analysis of qualitative choice behavior , 1972 .

[26] Andreas Krause,et al. Differentiable Learning of Submodular Models , 2017, NIPS 2017.

[27] Ryan P. Adams,et al. Randomized Optimum Models for Structured Prediction , 2012, AISTATS.

[28] Ryan P. Adams,et al. Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.

[29] Karol Gregor,et al. Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[30] Tamir Hazan,et al. PAC-Bayesian approach for minimization of phoneme error rate , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31] Regina Barzilay,et al. Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[32] George Papandreou,et al. Perturb-and-MAP random fields: Using discrete optimization to learn and sample from energy models , 2011, 2011 International Conference on Computer Vision.

[33] Arash Vahdat,et al. DVAE#: Discrete Variational Autoencoders with Relaxed Boltzmann Priors , 2018, NeurIPS.

[34] Andriy Mnih,et al. Variational Inference for Monte Carlo Objectives , 2016, ICML.

[35] Mingyuan Zhou,et al. ARM: Augment-REINFORCE-Merge Gradient for Stochastic Binary Networks , 2018, ICLR.

[36] Scott W. Linderman,et al. Learning Latent Permutations with Gumbel-Sinkhorn Networks , 2018, ICLR.

[37] Jascha Sohl-Dickstein,et al. REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models , 2017, NIPS.

[38] Tommi S. Jaakkola,et al. On the Partition Function and Random Maximum A-Posteriori Perturbations , 2012, ICML.

[39] S. Nadarajah,et al. Extreme Value Distributions: Theory and Applications , 2000 .

[40] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[41] Tamir Hazan,et al. Direct Loss Minimization for Structured Prediction , 2010, NIPS.

[42] Yang Song,et al. Training Deep Neural Networks via Direct Loss Minimization , 2015, ICML.