Capturing Lexical Variation in MT Evaluation Using Automatically Built Sense-Cluster Inventories

The strict character of most of the existing Machine Translation (MT) evaluation metrics does not permit them to capture lexical variation in translation. However, a central issue in MT evaluation is the high correlation that the metrics should have with human judg- ments of translation quality. In order to achieve a higher correlation, the identification of sense correspondences between the compared translations becomes really important. Given that most metrics are looking for exact correspondences, the evaluation results are often mis- leading concerning translation quality. Apart from that, existing metrics do not permit one to make a conclusive estimation of the impact of Word Sense Disambiguation techniques into MT systems. In this paper, we show how information acquired by an unsupervised semantic analysis method can be used to render MT evaluation more sensitive to lexical semantics. The sense inventories built by this data-driven method are incorporated into METEOR: they replace WordNet for evaluation in English and render METEOR's synonymy module operable in French. The evaluation results demonstrate that the use of these inventories gives rise to an increase in the number of matches and the correlation with human judgments of translation quality, compared to precision-based metrics.

[1]  Hermann Ney,et al.  Statistical Machine Translation with Scarce Resources Using Morpho-syntactic Information , 2004, CL.

[2]  Marine Carpuat,et al.  Word Sense Disambiguation vs. Statistical Machine Translation , 2005, ACL.

[3]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[4]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[5]  Philipp Koehn,et al.  Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.

[6]  Andy Way,et al.  Contextual Bitext-Derived Paraphrases in Automatic MT Evaluation , 2006, WMT@HLT-NAACL.

[7]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[8]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[9]  J. Ivey,et al.  Ann Arbor, Michigan , 1969 .

[10]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[11]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.

[12]  Marianna Apidianaki Translation-oriented Word Sense Induction Based on Parallel Corpora , 2008, LREC.

[13]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[14]  Hwee Tou Ng,et al.  Word Sense Disambiguation Improves Statistical Machine Translation , 2007, ACL.

[15]  Philipp Koehn,et al.  Further Meta-Evaluation of Machine Translation , 2008, WMT@ACL.

[16]  Marianna Apidianaki,et al.  Data-Driven Semantic Analysis for Multilingual WSD and Lexical Selection in Translation , 2009, EACL.

[17]  Nitin Madnani,et al.  Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric , 2009, WMT@EACL.

[18]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[19]  Andy Way,et al.  Labelled Dependencies in Machine Translation Evaluation , 2007, WMT@ACL.

[20]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[21]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[22]  Liang Zhou,et al.  Re-evaluating Machine Translation Results with Paraphrase Support , 2006, EMNLP.

[23]  Daniel Jurafsky,et al.  Robust Machine Translation Evaluation with Entailment Features , 2009, ACL.

[24]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[25]  Yifan He,et al.  Learning Labelled Dependencies in Machine Translation Evaluation , 2009, EAMT.