论文信息 - Dynamic Programming Encoding for Subword Segmentation in Neural Machine Translation

Dynamic Programming Encoding for Subword Segmentation in Neural Machine Translation

This paper introduces Dynamic Programming Encoding (DPE), a new segmentation algorithm for tokenizing sentences into subword units. We view the subword segmentation of output sentences as a latent variable that should be marginalized out for learning and inference. A mixed character-subword transformer is proposed, which enables exact log marginal likelihood estimation and exact MAP inference to find target segmentations with maximum posterior probability. DPE uses a lightweight mixed character-subword transformer as a means of pre-processing parallel data to segment output sentences using dynamic programming. Empirical results on machine translation suggest that DPE is effective for segmenting output sentences and can be combined with BPE dropout for stochastic segmentation of source sentences. DPE achieves an average improvement of 0.9 BLEU over BPE (Sennrich et al., 2016) and an average improvement of 0.55 BLEU over BPE dropout (Provilkov et al., 2019) on several WMT datasets including English (German, Romanian, Estonian, Finnish, Hungarian).

Gholamreza Haffari | Mohammad Norouzi | Xuanli He

[1] Yoshua Bengio,et al. On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[2] Yu Zhang,et al. Latent Sequence Decompositions , 2016, ICLR.

[3] Jason Lee,et al. Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement , 2018, EMNLP.

[4] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[5] Chris Dyer,et al. Learning to Discover, Ground and Use Words with Segmental Neural Language Models , 2018, ACL.

[6] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[7] Edouard Grave,et al. Training Hybrid Language Models by Marginalizing over Segmentations , 2019, ACL.

[8] Wang Ling,et al. Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation , 2015, EMNLP.

[9] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[10] Chong Wang,et al. Sequence Modeling via Segmentations , 2017, ICML.

[11] Ankur Bapna,et al. Revisiting Character-Based Neural Machine Translation with Capacity and Compression , 2018, EMNLP.