论文信息 - Fast Structured Decoding for Sequence Models - 字舞流文

Fast Structured Decoding for Sequence Models

Autoregressive sequence models achieve state-of-the-art performance in domains like machine translation. However, due to the autoregressive factorization nature, these models suffer from heavy latency during inference. Recently, non-autoregressive sequence models were proposed to reduce the inference time. However, these models assume that the decoding process of each token is conditionally independent of others. Such a generation process sometimes makes the output sentence inconsistent, and thus the learned non-autoregressive models could only achieve inferior accuracy compared to their autoregressive counterparts. To improve then decoding consistency and reduce the inference cost at the same time, we propose to incorporate a structured inference module into the non-autoregressive models. Specifically, we design an efficient approximation for Conditional Random Fields (CRF) for non-autoregressive sequence models, and further propose a dynamic transition technique to model positional contexts in the CRF. Experiments in machine translation show that while increasing little latency (8~14ms), our model could achieve significantly better translation performance than previous non-autoregressive models on different translation datasets. In particular, for the WMT14 En-De dataset, our model obtains a BLEU score of 26.80, which largely outperforms the previous non-autoregressive baselines and is only 0.61 lower in BLEU than purely autoregressive models.

Zhi-Hong Deng | Di He | Zhuohan Li | Zhiqing Sun | Haoqing Wang | Zi Lin

[1] Jason Lee,et al. Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement , 2018, EMNLP.

[2] Andrew McCallum,et al. Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[3] Aurko Roy,et al. Theory and Experiments on Vector Quantized Autoencoders , 2018, ArXiv.

[4] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[5] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.

[6] Di He,et al. Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input , 2018, AAAI.

[7] Di He,et al. Non-Autoregressive Machine Translation with Auxiliary Regularization , 2019, AAAI.

[8] Michael Collins,et al. Forward-Backward Algorithm , 2009, Encyclopedia of Biometrics.

[9] Slav Petrov,et al. Globally Normalized Transition-Based Neural Networks , 2016, ACL.

[10] Samy Bengio,et al. Tensor2Tensor for Neural Machine Translation , 2018, AMTA.

[11] Alexandre Allauzen,et al. From n-gram-based to CRF-based Translation Models , 2011, WMT@EMNLP.

[12] Daniel Marcu,et al. Unsupervised Neural Hidden Markov Models , 2016, SPNLP@EMNLP.

[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Alexander M. Rush,et al. Structured Attention Networks , 2017, ICLR.

[16] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[17] Di He,et al. Hint-based Training for Non-Autoregressive Translation , 2018 .

[18] Jindrich Libovický,et al. End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification , 2018, EMNLP.

[19] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[20] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[21] Joelle Pineau,et al. An Actor-Critic Algorithm for Sequence Prediction , 2016, ICLR.

[22] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[23] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[24] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[25] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[26] Victor O. K. Li,et al. Non-Autoregressive Neural Machine Translation , 2017, ICLR.

[27] Marc'Aurelio Ranzato,et al. Classical Structured Prediction Losses for Sequence to Sequence Learning , 2017, NAACL.

[28] Andrew McCallum,et al. An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..

[29] Alexander M. Rush,et al. Sequence-Level Knowledge Distillation , 2016, EMNLP.

[30] Tie-Yan Liu,et al. Hint-Based Training for Non-Autoregressive Machine Translation , 2019, EMNLP.

[31] Alexander M. Rush,et al. A Tutorial on Deep Latent Variable Models of Natural Language , 2018, ArXiv.

[32] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[33] Aurko Roy,et al. Fast Decoding in Sequence Models using Discrete Latent Variables , 2018, ICML.