论文信息 - Inference Strategies for Machine Translation with Conditional Masking

Inference Strategies for Machine Translation with Conditional Masking

Conditional masked language model (CMLM) training has proven successful for non-autoregressive and semi-autoregressive sequence generation tasks, such as machine translation. Given a trained CMLM, however, it is not clear what the best inference strategy is. We formulate masked inference as a factorization of conditional probabilities of partial sequences, show that this does not harm performance, and investigate a number of simple heuristics motivated by this perspective. We identify a thresholding strategy that has advantages over the standard "mask-predict" algorithm, and provide analyses of its behavior on machine translation tasks.

Colin Cherry | George Foster | Julia Kreutzer

[1] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[2] Jason Lee,et al. Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement , 2018, EMNLP.

[3] Philipp Koehn,et al. Findings of the 2017 Conference on Machine Translation (WMT17) , 2017, WMT.

[4] Matt Post,et al. A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[5] Omer Levy,et al. Semi-Autoregressive Training Improves Mask-Predict Decoding , 2020, ArXiv.

[6] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[7] Changhan Wang,et al. Levenshtein Transformer , 2019, NeurIPS.

[8] Alex Wang,et al. A Generalized Framework of Sequence Generation with Application to Undirected Sequence Models , 2019, ArXiv.

[9] Kyunghyun Cho,et al. Latent-Variable Non-Autoregressive Neural Machine Translation with Deterministic Inference using a Delta Posterior , 2019, AAAI.

[10] Jakob Uszkoreit,et al. Insertion Transformer: Flexible Sequence Generation via Insertion Operations , 2019, ICML.

[11] Omer Levy,et al. Mask-Predict: Parallel Decoding of Conditional Masked Language Models , 2019, EMNLP.

[12] André F. T. Martins,et al. Learning What’s Easy: Fully Differentiable Neural Easy-First Taggers , 2017, EMNLP.

[13] Aurko Roy,et al. Fast Decoding in Sequence Models using Discrete Latent Variables , 2018, ICML.