论文信息 - Translating Terminological Expressions in Knowledge Bases with Neural Machine Translation

Translating Terminological Expressions in Knowledge Bases with Neural Machine Translation

Our work presented in this paper focuses on the translation of terminological expressions represented in semantically structured resources, like ontologies or knowledge graphs. The challenge of translating ontology labels or terminological expressions documented in knowledge bases lies in the highly specific vocabulary and the lack of contextual information, which can guide a machine translation system to translate ambiguous words into the targeted domain. Due to these challenges, we evaluate the translation quality of domain-specific expressions in the medical and financial domain with statistical as well as with neural machine translation methods and experiment domain adaptation of the translation models with terminological expressions only. Furthermore, we perform experiments on the injection of external terminological expressions into the translation systems. Through these experiments, we observed a significant advantage in domain adaptation for the domain-specific resource in the medical and financial domain and the benefit of subword models over word-based neural machine translation models for terminology translation.

Paul Buitelaar | Mihael Arcan | Daniel Torregrosa

[1] Philipp Koehn,et al. Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[2] Declan O'Sullivan,et al. Cross-Lingual Ontology Mapping - An Investigation of the Impact of Machine Translation , 2009, ASWC.

[3] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[4] Maja Popovic,et al. chrF: character n-gram F-score for automatic MT evaluation , 2015, WMT@EMNLP.

[5] Chenhui Chu,et al. An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation , 2017, ACL.

[6] Paul Buitelaar,et al. Identification of Bilingual Terms from Monolingual Documents for Statistical Machine Translation , 2014 .

[7] Felix Sasaki,et al. Improving Machine Translation through Linked Data , 2017, Prague Bull. Math. Linguistics.

[8] Tomaz Erjavec,et al. The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages , 2006, LREC.

[9] Philipp Cimiano,et al. Combining statistical and semantic approaches to the translation of ontologies and taxonomies , 2011, SSST@ACL.

[10] Paul Buitelaar,et al. Cross-Lingual Querying and Comparison of Linked Financial and Business Data , 2013, ESWC.

[11] Martin Volk,et al. Assessing post-editing efficiency in a realistic translation environment , 2013, MTSUMMIT.