论文信息 - Latent Alignment and Variational Attention

Latent Alignment and Variational Attention

Neural attention has become central to many state-of-the-art models in natural language processing and related domains. Attention networks are an easy-to-train and effective method for softly simulating alignment; however, the approach does not marginalize over latent alignments in a probabilistic sense. This property makes it difficult to compare attention to other alignment approaches, to compose it with probabilistic models, and to perform posterior inference conditioned on observed data. A related latent approach, hard attention, fixes these issues, but is generally harder to train and less accurate. This work considers variational attention networks, alternatives to soft and hard attention for learning latent variable alignment models, with tighter approximation bounds based on amortized variational inference. We further propose methods for reducing the variance of gradients to make these approaches computationally feasible. Experiments show that for machine translation and visual question answering, inefficient exact latent variable models outperform standard neural attention, but these gains go away when using hard attention based training. On the other hand, variational attention retains most of the performance gain but with training speed comparable to neural attention.

Alexander M. Rush | Yoon Kim | Demi Guo | Yuntian Deng | Justin T Chiu

[1] Ryan Cotterell,et al. Hard Non-Monotonic Attention for Character-Level Transduction , 2018, EMNLP.

[2] Martin Jankowiak,et al. Pathwise Derivatives Beyond the Reparameterization Trick , 2018, ICML.

[3] Trevor Cohn,et al. A Stochastic Decoder for Neural Machine Translation , 2018, ACL.

[4] Jason Lee,et al. Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement , 2018, EMNLP.

[5] Claire Cardie,et al. SparseMAP: Differentiable Sparse Structured Inference , 2018, ICML.

[6] Arthur Mensch,et al. Differentiable Dynamic Programming for Structured Prediction and Attention , 2018, ICML.

[7] Alexander M. Rush,et al. Semi-Amortized Variational Autoencoders , 2018, ICML.

[8] Shan Wu,et al. Variational Recurrent Neural Machine Translation , 2018, AAAI.

[9] Pascal Poupart,et al. Variational Attention for Sequence-to-Sequence Models , 2017, COLING.

[10] Yoshua Bengio,et al. Z-Forcing: Training Stochastic Recurrent Networks , 2017, NIPS.

[11] Marc'Aurelio Ranzato,et al. Classical Structured Prediction Losses for Sequence to Sequence Learning , 2017, NAACL.

[12] David Duvenaud,et al. Backpropagation through the Void: Optimizing control variates for black-box gradient estimation , 2017, ICLR.

[13] Matthew D. Hoffman,et al. On the challenges of learning with inference networks on sparse, high-dimensional data , 2017, AISTATS.

[14] Jörg Bornschein,et al. Variational Memory Addressing in Generative Models , 2017, NIPS.

[15] Kewei Tu,et al. Structured Attentions for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17] Tommi S. Jaakkola,et al. A causal framework for explaining the predictions of black-box sequence-to-sequence models , 2017, EMNLP.

[18] Rico Sennrich,et al. Proceedings of the Second Conference on Machine Translation, Volume 1: Research Papers , 2017 .

[19] Chong Wang,et al. Towards Neural Phrase-based Machine Translation , 2017, ICLR.

[20] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[21] Philipp Koehn,et al. Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[22] Yang Liu,et al. Learning Structured Text Representations , 2017, TACL.

[23] Colin Raffel,et al. Learning Hard Alignments with Variational Inference , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24] Bonggun Shin,et al. Classification of radiology reports using neural attention models , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[25] Vlad Niculae,et al. A Regularized Framework for Sparse and Structured Neural Attention , 2017, NIPS.

[26] Colin Raffel,et al. Online and Linear-Time Attention by Enforcing Monotonic Alignments , 2017, ICML.

[27] Jascha Sohl-Dickstein,et al. REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models , 2017, NIPS.

[28] Charles A. Sutton,et al. Autoencoding Variational Inference For Topic Models , 2017, ICLR.

[29] Alexander M. Rush,et al. Structured Attention Networks , 2017, ICLR.

[30] Lei Yu,et al. The Neural Noisy Channel , 2016, ICLR.

[31] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[32] Yee Whye Teh,et al. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[33] Roman Novak,et al. Iterative Refinement for Machine Translation , 2016, ArXiv.

[34] Uri Shalit,et al. Structured Inference Networks for Nonlinear State Space Models , 2016, AAAI.

[35] Yaoliang Yu,et al. Dropout with Expectation-linear Regularization , 2016, ICLR.