Neural Machine Translation with Inflected Lexicon

The paper presents experiments in neural machine translation with lexical constraints into a morphologically rich language. In particular and we introduce a method and based on constrained decoding and which handles the inflected forms of lexical entries and does not require any modification to the training data or model architecture. To evaluate its effectiveness and we carry out experiments in two different scenarios: general and domain-specific. We compare our method with baseline translation and i.e. translation without lexical constraints and in terms of translation speed and translation quality. To evaluate how well the method handles the constraints and we propose new evaluation metrics which take into account the presence and placement and duplication and inflectional correctness of lexical terms in the output sentence.

[1]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[2]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[3]  Huda Khayrallah,et al.  Improved Lexically Constrained Decoding for Translation and Monolingual Rewriting , 2019, NAACL.

[4]  Yue Zhang,et al.  Code-Switching for Enhancing NMT with Pre-Specified Translation , 2019, NAACL.

[5]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[6]  Jungi Kim,et al.  SYSTRAN Purely Neural MT Engines for WMT2017 , 2017, WMT.

[7]  Quoc V. Le,et al.  Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[8]  Matt Post,et al.  Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation , 2018, NAACL.

[9]  Yaser Al-Onaizan,et al.  Training Neural Machine Translation to Apply Terminology Constraints , 2019, ACL.

[10]  Gema Ramírez-Sánchez,et al.  Bifixer and Bicleaner: two open-source tools to clean your parallel data , 2020, EAMT.

[11]  Satoshi Nakamura,et al.  Incorporating Discrete Translation Lexicons into Neural Machine Translation , 2016, EMNLP.

[12]  Haizhou Li,et al.  Phrase-Based Statistical Machine Translation: A Level of Detail Approach , 2005, IJCNLP.

[13]  Matt Post,et al.  Sentential Paraphrasing as Black-Box Machine Translation , 2016, NAACL.

[14]  Basura Fernando,et al.  Guided Open Vocabulary Image Captioning with Constrained Beam Search , 2016, EMNLP.

[15]  Shamil Chollampatt,et al.  Lexically Constrained Neural Machine Translation with Levenshtein Transformer , 2020, ACL.

[16]  Lauritz Brandt,et al.  Terminology-Constrained Neural Machine Translation at SAP , 2020, EAMT.

[17]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[18]  Changhan Wang,et al.  Levenshtein Transformer , 2019, NeurIPS.

[19]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[20]  Gonzalo Iglesias,et al.  Neural Machine Translation Decoding with Terminology Constraints , 2018, NAACL.

[21]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[22]  Qun Liu,et al.  Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search , 2017, ACL.