METEOR for multiple target languages using DBnary

This paper proposes an extension of METEOR, a well-known MT evaluation metric, for multiple target languages using an in-house lexical resource called DBnary (an extraction from Wiktionary provided to the community as a Multilingual Lexical Linked Open Data). Today, the use of the synonymy module of METEOR is only exploited when English is the target language (use of WordNet). A synonymy module using DBnary would allow its use for the 21 languages (covered up to now) as target languages. The code of this new instance of METEOR, adapted to several target languages, is provided to the community via a github repository. We also show that our DBnary augmented METEOR increases the correlation with human judgements on the WMT 2013 and 2014 metrics dataset for English-to-(French, Russian, German, Spanish) language pairs.

[1]  Alon Lavie,et al.  Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.

[2]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[3]  Nitin Madnani,et al.  TER-Plus: paraphrase, semantic, and alignment enhancements to Translation Edit Rate , 2009, Machine Translation.

[4]  Ondrej Bojar,et al.  Results of the WMT17 Metrics Shared Task , 2017, WMT.

[5]  Marianna Apidianaki,et al.  METEOR-WSD: Improved Sense Matching in MT Evaluation , 2015, SSST@NAACL-HLT.

[6]  Roberto Navigli,et al.  Entity Linking meets Word Sense Disambiguation: a Unified Approach , 2014, TACL.

[7]  Alon Lavie,et al.  METEOR-NEXT and the METEOR Paraphrase Tables: Improved Evaluation Support for Five Target Languages , 2010, WMT@ACL.

[8]  Philipp Cimiano,et al.  Linking Lexical Resources and Ontologies on the Semantic Web with Lemon , 2011, ESWC.

[9]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[10]  Cyril Goutte Automatic Evaluation of Machine Translation Quality , 2006 .

[11]  Alon Lavie,et al.  Extending the METEOR Machine Translation Evaluation Metric to the Phrase Level , 2010, NAACL.

[12]  Paul Buitelaar,et al.  LexInfo: A declarative model for the lexicon-ontology interface , 2011, J. Web Semant..

[13]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[14]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[15]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[16]  Nitin Madnani,et al.  Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric , 2009, WMT@EACL.

[17]  Jeremy J. Carroll,et al.  Resource description framework (rdf) concepts and abstract syntax , 2003 .

[18]  Christian Chiarcos,et al.  OLiA - Ontologies of Linguistic Annotation , 2015, Semantic Web.