论文信息 - 2 Background : Latent Alignment and Neural Attention

2 Background : Latent Alignment and Neural Attention

Neural attention has become central to many state-of-the-art models in natural language processing and related domains. Attention networks are an easy-to-train and effective method for softly simulating alignment; however, the approach does not marginalize over latent alignments in a probabilistic sense. This property makes it difficult to compare attention to other alignment approaches, to compose it with probabilistic models, and to perform posterior inference conditioned on observed data. A related latent approach, hard attention, fixes these issues, but is generally harder to train and less accurate. This work considers variational attention networks, alternatives to soft and hard attention for learning latent variable alignment models, with tighter approximation bounds based on amortized variational inference. We further propose methods for reducing the variance of gradients to make these approaches computationally feasible. Experiments show that for machine translation and visual question answering, inefficient exact latent variable models outperform standard neural attention, but these gains go away when using hard attention based training. On the other hand, variational attention retains most of the performance gain but with training speed comparable to neural attention.

Alexander M. Rush | Demi Guo | Yuntian Deng | Justin T Chiu

[1] Sunita Sarawagi,et al. Surprisingly Easy Hard-Attention for Sequence to Sequence Learning , 2018, EMNLP.

[2] Ryan Cotterell,et al. Hard Non-Monotonic Attention for Character-Level Transduction , 2018, EMNLP.

[3] Martin Jankowiak,et al. Pathwise Derivatives Beyond the Reparameterization Trick , 2018, ICML.

[4] Wilker Aziz,et al. A Stochastic Decoder for Neural Machine Translation , 2018, ACL.

[5] Jason Lee,et al. Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement , 2018, EMNLP.

[6] Claire Cardie,et al. SparseMAP: Differentiable Sparse Structured Inference , 2018, ICML.

[7] Arthur Mensch,et al. Differentiable Dynamic Programming for Structured Prediction and Attention , 2018, ICML.

[8] Alexander M. Rush,et al. Semi-Amortized Variational Autoencoders , 2018, ICML.

[9] Shan Wu,et al. Variational Recurrent Neural Machine Translation , 2018, AAAI.

[10] Pascal Poupart,et al. Variational Attention for Sequence-to-Sequence Models , 2017, COLING.

[11] Marc'Aurelio Ranzato,et al. Classical Structured Prediction Losses for Sequence to Sequence Learning , 2017, NAACL.

[12] David Duvenaud,et al. Backpropagation through the Void: Optimizing control variates for black-box gradient estimation , 2017, ICLR.

[13] Matthew D. Hoffman,et al. On the challenges of learning with inference networks on sparse, high-dimensional data , 2017, AISTATS.

[14] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15] Chong Wang,et al. Towards Neural Phrase-based Machine Translation , 2017, ICLR.

[16] Yang Liu,et al. Learning Structured Text Representations , 2017, TACL.

[17] Colin Raffel,et al. Learning Hard Alignments with Variational Inference , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18] Yoshua Bengio,et al. Z-Forcing: Training Stochastic Recurrent Networks , 2017, NIPS.

[19] Jörg Bornschein,et al. Variational Memory Addressing in Generative Models , 2017, NIPS.

[20] Kewei Tu,et al. Structured Attentions for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21] Tommi S. Jaakkola,et al. A causal framework for explaining the predictions of black-box sequence-to-sequence models , 2017, EMNLP.

[22] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[23] Bonggun Shin,et al. Classification of radiology reports using neural attention models , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[24] Vlad Niculae,et al. A Regularized Framework for Sparse and Structured Neural Attention , 2017, NIPS.

[25] Jascha Sohl-Dickstein,et al. REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models , 2017, NIPS.

[26] Charles A. Sutton,et al. Autoencoding Variational Inference For Topic Models , 2017, ICLR.

[27] Alexander M. Rush,et al. Structured Attention Networks , 2017, ICLR.

[28] Lei Yu,et al. The Neural Noisy Channel , 2016, ICLR.

[29] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.