An Evaluation Benchmark for Testing the Word Sense Disambiguation Capabilities of Machine Translation Systems

Lexical ambiguity is one of the many challenging linguistic phenomena involved in translation, i.e., translating an ambiguous word with its correct sense. In this respect, previous work has shown that the translation quality of neural machine translation systems can be improved by explicitly modeling the senses of ambiguous words. Recently, several evaluation test sets have been proposed to measure the word sense disambiguation (WSD) capability of machine translation systems. However, to date, these evaluation test sets do not include any training data that would provide a fair setup measuring the sense distributions present within the training data itself. In this paper, we present an evaluation benchmark on WSD for machine translation for 10 language pairs, comprising training data with known sense distributions. Our approach for the construction of the benchmark builds upon the wide-coverage multilingual sense inventory of BabelNet, the multilingual neural parsing pipeline TurkuNLP, and the OPUS collection of translated texts from the web. The test suite is available at http://github.com/Helsinki-NLP/MuCoW.

[1]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[2]  Frederick Liu,et al.  Handling Homographs in Neural Machine Translation , 2017, NAACL.

[3]  Tapio Salakoski,et al.  Turku Neural Parser Pipeline: An End-to-End System for the CoNLL 2018 Shared Task , 2018, CoNLL.

[4]  Roberto Navigli,et al.  Neural Sequence Learning Models for Word Sense Disambiguation , 2017, EMNLP.

[5]  Rico Sennrich,et al.  The Word Sense Disambiguation Test Suite at WMT18 , 2018, WMT.

[6]  Partha Pratim Talukdar,et al.  Zero-shot Word Sense Disambiguation using Sense Definition Embeddings , 2019, ACL.

[7]  Nigel Collier,et al.  Towards a Seamless Integration of Word Senses into Downstream NLP Applications , 2017, ACL.

[8]  Andreas Eisele,et al.  MultiUN: A Multilingual Corpus from United Nation Documents , 2010, LREC.

[9]  Hwee Tou Ng,et al.  Word Sense Disambiguation Improves Information Retrieval , 2012, ACL.

[10]  Daniel Loureiro,et al.  Language Modelling Makes Sense: Propagating Representations through WordNet for Full-Coverage Word Sense Disambiguation , 2019, ACL.

[11]  Ignacio Iacobacci,et al.  Embedding Words and Senses Together via Joint Knowledge-Enhanced Training , 2016, CoNLL.

[12]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[13]  Joakim Nivre,et al.  Encoders Help You Disambiguate Word Senses in Neural Machine Translation , 2019, EMNLP/IJCNLP.

[14]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[15]  Joakim Nivre,et al.  An Analysis of Attention Mechanisms: The Case of Word Sense Disambiguation in Neural Machine Translation , 2018, WMT.

[16]  Jörg Tiedemann,et al.  The MuCoW Test Suite at WMT 2019: Automatically Harvested Multilingual Contrastive Word Sense Disambiguation Test Sets for Machine Translation , 2019, WMT.

[17]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[18]  Marcello Federico,et al.  Report on the 10th IWSLT evaluation campaign , 2013, IWSLT.

[19]  Fumiyo Fukumoto,et al.  Text Categorization by Learning Predominant Sense of Words as Auxiliary Task , 2019, ACL.

[20]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[21]  Laura Mascarell,et al.  Improving Word Sense Disambiguation in Neural Machine Translation with Sense Embeddings , 2017, WMT.

[22]  Xiao Pu,et al.  Integrating Weakly Supervised Word Sense Disambiguation into Neural Machine Translation , 2018, TACL.

[23]  Roberto Navigli,et al.  Meaningful Clustering of Senses Helps Boost Word Sense Disambiguation Performance , 2006, ACL.

[24]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[25]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[26]  M. A. R T H A P A L,et al.  Making fine-grained and coarse-grained sense distinctions , both manually and automatically , 2005 .

[27]  Zeljko Agic,et al.  JW300: A Wide-Coverage Parallel Corpus for Low-Resource Languages , 2019, ACL.