Incorporating Named Entity Information into Neural Machine Translation

Most neural machine translation (NMT) models normally take the subword-level sequence as input to address the problem of representing out-of-vocabulary words (OOVs). However, using subword units as input may omit the information carried by larger text granularity, such as named entities, which leads to a loss of important semantic information. In this paper, we propose a simple but effective method to incorporate the named entity (NE) tags information into the Transformer translation system. The encoder of our proposed model takes both the subwords and the NE tags of source sentences as inputs. Furthermore, we introduce a novel entity-aligned attention mechanism to make full use of the chunk information of NE tags. The proposed approach can be easily integrated into the existing framework of Transformer. Experimental results on two public translation tasks demonstrate that our proposed method can achieve significant translation improvements over the basic Transformer model and also outperforms the existing competitive systems.

[1]  Mauro Cettolo,et al.  WIT3: Web Inventory of Transcribed and Translated Talks , 2012, EAMT.

[2]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[3]  Akihiro Tamura,et al.  Neural Machine Translation Incorporating Named Entity , 2018, COLING.

[4]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[5]  Adam Lopez,et al.  Statistical machine translation , 2008, AMTA.

[6]  Kehai Chen,et al.  Lattice-Based Transformer Encoder for Neural Machine Translation , 2019, ACL.

[7]  Gonzalo Iglesias,et al.  Neural Machine Translation Decoding with Terminology Constraints , 2018, NAACL.

[8]  Zhihua Wei,et al.  Mixed Pooling for Convolutional Neural Networks , 2014, RSKT.

[9]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[10]  Quoc V. Le,et al.  The Evolved Transformer , 2019, ICML.

[11]  Tong Zhang,et al.  ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations , 2019, FINDINGS.

[12]  Fandong Meng,et al.  DTMT: A Novel Deep Transition Architecture for Neural Machine Translation , 2019, AAAI.

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Rico Sennrich,et al.  Linguistic Input Features Improve Neural Machine Translation , 2016, WMT.

[15]  Shujian Huang,et al.  Combining Character and Word Information in Neural Machine Translation Using a Multi-Level Attention , 2018, NAACL.

[16]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[17]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[18]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[19]  Min Zhang,et al.  Translating Phrases in Neural Machine Translation , 2017, EMNLP.

[20]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[21]  Bowen Zhou,et al.  Pointing the Unknown Words , 2016, ACL.

[22]  Bowen Yu,et al.  Enhancing Pre-trained Chinese Character Representation with Word-aligned Attention , 2020, ACL.

[23]  Matthias Huck,et al.  Better OOV Translation with Bilingual Terminology Mining , 2019, ACL.