Translating Terminological Expressions in Knowledge Bases with Neural Machine Translation

Our work presented in this paper focuses on the translation of terminological expressions represented in semantically structured resources, like ontologies or knowledge graphs. The challenge of translating ontology labels or terminological expressions documented in knowledge bases lies in the highly specific vocabulary and the lack of contextual information, which can guide a machine translation system to translate ambiguous words into the targeted domain. Due to these challenges, we evaluate the translation quality of domain-specific expressions in the medical and financial domain with statistical as well as with neural machine translation methods and experiment domain adaptation of the translation models with terminological expressions only. Furthermore, we perform experiments on the injection of external terminological expressions into the translation systems. Through these experiments, we observed a significant advantage in domain adaptation for the domain-specific resource in the medical and financial domain and the benefit of subword models over word-based neural machine translation models for terminology translation.

[1]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[2]  Declan O'Sullivan,et al.  Cross-Lingual Ontology Mapping - An Investigation of the Impact of Machine Translation , 2009, ASWC.

[3]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[4]  Maja Popovic,et al.  chrF: character n-gram F-score for automatic MT evaluation , 2015, WMT@EMNLP.

[5]  Chenhui Chu,et al.  An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation , 2017, ACL.

[6]  Paul Buitelaar,et al.  Identification of Bilingual Terms from Monolingual Documents for Statistical Machine Translation , 2014 .

[7]  Felix Sasaki,et al.  Improving Machine Translation through Linked Data , 2017, Prague Bull. Math. Linguistics.

[8]  Tomaz Erjavec,et al.  The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages , 2006, LREC.

[9]  Philipp Cimiano,et al.  Combining statistical and semantic approaches to the translation of ontologies and taxonomies , 2011, SSST@ACL.

[10]  Paul Buitelaar,et al.  Cross-Lingual Querying and Comparison of Linked Financial and Business Data , 2013, ESWC.

[11]  Martin Volk,et al.  Assessing post-editing efficiency in a realistic translation environment , 2013, MTSUMMIT.

[12]  Philipp Koehn,et al.  Results of the WMT15 Metrics Shared Task , 2015, WMT@EMNLP.

[13]  Alon Lavie,et al.  Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.

[14]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[15]  Asunción Gómez-Pérez,et al.  Multilingual Lexical Semantic Resources for Ontology Translation , 2006, LREC.

[16]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[17]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[18]  Quoc V. Le,et al.  Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[19]  Paul Buitelaar,et al.  Domain adaptation for ontology localization , 2016, J. Web Semant..

[20]  Paul Buitelaar,et al.  Translating the FINREP Taxonomy using a Domain-specific Corpus , 2013, MTSUMMIT.

[21]  Lucia Specia,et al.  Guiding Neural Machine Translation Decoding with External Knowledge , 2017, WMT.

[22]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[23]  Marcello Federico,et al.  Multi-Domain Neural Machine Translation through Unsupervised Adaptation , 2017, WMT.

[24]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[25]  Jeffrey Heer,et al.  The efficacy of human post-editing for language translation , 2013, CHI.

[26]  Wang Ling,et al.  Character-based Neural Machine Translation , 2015, ArXiv.

[27]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[28]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[29]  Christophe Servan,et al.  Domain specialization: a post-training domain adaptation for Neural Machine Translation , 2016, ArXiv.

[30]  Asunción Gómez-Pérez,et al.  Ontology localization , 2009, K-CAP '09.

[31]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[32]  Takahiro Hara,et al.  Improving the extraction of bilingual terminology from Wikipedia , 2009, TOMCCAP.

[33]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[34]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[35]  Asunción Gómez-Pérez,et al.  A note on ontology localization , 2010, Appl. Ontology.

[36]  Asunción Gómez-Pérez,et al.  Challenges for the multilingual Web of Data , 2012, J. Web Semant..

[37]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[38]  Jan Niehues,et al.  Using Wikipedia to translate domain-specific terms in SMT , 2011, IWSLT.

[39]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[40]  Paul Buitelaar,et al.  Knowledge Portability with Semantic Expansion of Ontology Labels , 2015, ACL.

[41]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.