An Exploration of Placeholding in Neural Machine Translation

Phrase-based machine translation provides the system developer with controls that enable fine-grained control over machine translation output. One approach to provide similar control in neural machine translation is placeholding (herein called masking), which replaces input tokens with masks which are replaced with the original input text in post-processing. But is this a good idea? We undertake an exploration of masking in French–English and Japanese–English using Transformer architectures. We attempt to quantify whether (and where) masking is necessary with analysis of a baseline system, and then explore numerous parameterization of masking, including post-processing techniques for replacing the masks. Our analysis shows this to be a thorny matter; masks solve some problems but are not perfectly translated themselves.

[1]  Byron C. Wallace,et al.  Attention is not Explanation , 2019, NAACL.

[2]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[3]  Daniel Jurafsky,et al.  JESC: Japanese-English Subtitle Corpus , 2017, LREC.

[4]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[5]  Quoc V. Le,et al.  Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[6]  Lucia Specia,et al.  Guiding Neural Machine Translation Decoding with External Knowledge , 2017, WMT.

[7]  Mikio Yamamoto,et al.  Translation of Patent Sentences with a Large Vocabulary of Technical Terms Using Neural Machine Translation , 2016, WAT@COLING.

[8]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[9]  Basura Fernando,et al.  Guided Open Vocabulary Image Captioning with Constrained Beam Search , 2016, EMNLP.

[10]  Haizhou Li,et al.  Named-Entity Tagging and Domain adaptation for Better Customized Translation , 2018, NEWS@ACL.

[11]  Philippe Langlais,et al.  Hashtag Occurrences, Layout and Translation: A Corpus-driven Analysis of Tweets Published by the Canadian Government , 2014, LREC.

[12]  Graham Neubig,et al.  MTNT: A Testbed for Machine Translation of Noisy Text , 2018, EMNLP.

[13]  Matt Post,et al.  We start by defining the recurrent architecture as implemented in S OCKEYE , following , 2018 .

[14]  Qun Liu,et al.  Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search , 2017, ACL.

[15]  Gonzalo Iglesias,et al.  Neural Machine Translation Decoding with Terminology Constraints , 2018, NAACL.

[16]  Bo Han,et al.  : telephone: : person: : sailboat: : whale: : okhand: ; or "Call me Ishmael" - How do you translate emoji? , 2016, ALTA.

[17]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[18]  Mauro Cettolo,et al.  WIT3: Web Inventory of Transcribed and Translated Talks , 2012, EAMT.

[19]  Akihiro Tamura,et al.  Neural Machine Translation Incorporating Named Entity , 2018, COLING.

[20]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[21]  Rico Sennrich,et al.  Linguistic Input Features Improve Neural Machine Translation , 2016, WMT.

[22]  Kenneth Heafield,et al.  Neural Machine Translation Techniques for Named Entity Transliteration , 2018, NEWS@ACL.

[23]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[24]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[25]  Matt Post,et al.  Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation , 2018, NAACL.

[26]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.