Translating Domain-Specific Expressions in Knowledge Bases with Neural Machine Translation

Our work presented in this paper focuses on the translation of domain-specific expressions represented in semantically structured resources, like ontologies or knowledge graphs. To make knowledge accessible beyond language borders, these resources need to be translated into different languages. The challenge of translating labels or terminological expressions represented in ontologies lies in the highly specific vocabulary and the lack of contextual information, which can guide a machine translation system to translate ambiguous words into the targeted domain. Due to the challenges, we train and translate the terminological expressions in the medial and financial domain with statistical as well as with neural machine translation methods. We evaluate the translation quality of domainspecific expressions with translation systems trained on a generic dataset and experiment domain adaptation with terminological expressions. Furthermore we perform experiments on the injection of external knowledge into the translation systems. Through these experiments, we observed a clear advantage in domain adaptation and terminology injection of NMT methods over SMT. Nevertheless, through the specific and unique terminological expressions, subword segmentation within NMT does not outperform a word based neural translation model.

[1]  Tomaz Erjavec,et al.  The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages , 2006, LREC.

[2]  Philipp Cimiano,et al.  Combining statistical and semantic approaches to the translation of ontologies and taxonomies , 2011, SSST@ACL.

[3]  Takahiro Hara,et al.  Improving the extraction of bilingual terminology from Wikipedia , 2009, TOMCCAP.

[4]  Jan Niehues,et al.  Using Wikipedia to translate domain-specific terms in SMT , 2011, IWSLT.

[5]  Philipp Koehn,et al.  Explorer Results of the WMT 15 Metrics Shared Task , 2015 .

[6]  Asunción Gómez-Pérez,et al.  Multilingual Lexical Semantic Resources for Ontology Translation , 2006, LREC.

[7]  Lucia Specia,et al.  Guiding Neural Machine Translation Decoding with External Knowledge , 2017, WMT.

[8]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[9]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[10]  Martin Volk,et al.  Assessing post-editing efficiency in a realistic translation environment , 2013, MTSUMMIT.

[11]  Alon Lavie,et al.  Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.

[12]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[13]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[14]  Paul Buitelaar,et al.  Knowledge Portability with Semantic Expansion of Ontology Labels , 2015, ACL.

[15]  Marcello Federico Measuring User Productivity in Machine Translation Enhanced Computer Assisted Translation , 2012, AMTA.

[16]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[17]  Philip Gage,et al.  A new algorithm for data compression , 1994 .

[18]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[19]  Jeffrey Heer,et al.  The efficacy of human post-editing for language translation , 2013, CHI.

[20]  Asunción Gómez-Pérez,et al.  Ontology Localization , 2012, Ontology Engineering in a Networked World.

[21]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[22]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[23]  Declan O'Sullivan,et al.  Cross-Lingual Ontology Mapping - An Investigation of the Impact of Machine Translation , 2009, ASWC.

[24]  Asunción Gómez-Pérez,et al.  A note on ontology localization , 2010, Appl. Ontology.

[25]  Asunción Gómez-Pérez,et al.  Challenges for the multilingual Web of Data , 2012, J. Web Semant..

[26]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[27]  Felix Sasaki,et al.  Improving Machine Translation through Linked Data , 2017, Prague Bull. Math. Linguistics.

[28]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[29]  José A. R. Fonollosa,et al.  Character-based Neural Machine Translation , 2016, ACL.

[30]  Christophe Servan,et al.  Domain specialization: a post-training domain adaptation for Neural Machine Translation , 2016, ArXiv.

[31]  Chenhui Chu,et al.  An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation , 2017, ACL.

[32]  Maja Popovic,et al.  chrF: character n-gram F-score for automatic MT evaluation , 2015, WMT@EMNLP.

[33]  Paul Buitelaar,et al.  Identification of Bilingual Terms from Monolingual Documents for Statistical Machine Translation , 2014 .

[34]  Christopher D. Manning,et al.  Stanford Neural Machine Translation Systems for Spoken Language Domains , 2015, IWSLT.

[35]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[36]  Quoc V. Le,et al.  Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[37]  Wang Ling,et al.  Character-based Neural Machine Translation , 2015, ArXiv.

[38]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[39]  Paul Buitelaar,et al.  Translating the FINREP Taxonomy using a Domain-specific Corpus , 2013, MTSUMMIT.

[40]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[41]  Paul Buitelaar,et al.  Cross-Lingual Querying and Comparison of Linked Financial and Business Data , 2013, ESWC.

[42]  Francis M. Tyers,et al.  Extracting bilingual word pairs from Wikipedia , 2008 .

[43]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[44]  Paul Buitelaar,et al.  Domain adaptation for ontology localization , 2016, J. Web Semant..

[45]  Ondrej Bojar,et al.  Results of the WMT13 Metrics Shared Task , 2015, WMT@EMNLP.