Automatic Machine Translation Evaluation with Part-of-Speech Information

One problem of automatic translation is the evaluation of the result. The result should be as close to a human reference translation as possible, but varying word order or synonyms have to be taken into account for the evaluation of the similarity of both. In the conventional methods, researchers tend to employ many resources such as the synonyms vocabulary, paraphrasing, and text entailment data, etc. To make the evaluation model both accurate and concise, this paper explores the evaluation only using Part-of-Speech information of the words, which means the method is based only on the consilience of the POS strings of the hypothesis translation and reference. In this developed method, the POS also acts as the similar function with the synonyms in addition to its syntactic or morphological behaviour of the lexical item in question. Measures for the similarity between machine translation and human reference are dependent on the language pair since the word order or the number of synonyms may vary, for instance. This new measure solves this problem to a certain extent by introducing weights to different sources of information. The experiment results on English, German and French languages correlate on average better with the human reference than some existing measures, such as BLEU, AMBER and MP4IBM1.

[1]  Keh-Yih Su,et al.  A New Quantitative Quality Measure for Machine Translation Systems , 1992, COLING.

[2]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[3]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[4]  Maja Popovic Morphemes and POS tags for n-gram based evaluation metrics , 2011, WMT@EMNLP.

[5]  Roland Kuhn,et al.  AMBER: A Modified BLEU, Enhanced Ranking Metric , 2011, WMT@EMNLP.

[6]  Lidia S. Chao,et al.  LEPOR: A Robust Evaluation Metric for Machine Translation with Augmented Factors , 2012, COLING.

[7]  Colin Cherry Statistical Machine Translation Philipp Koehn (University of Edinburgh) Cambridge University Press, 2010, xii+433 pp; ISBN 978-0-521-87415-1, $60.00 , 2010, Computational Linguistics.

[8]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[9]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[10]  Cyril Goutte Automatic Evaluation of Machine Translation Quality , 2006 .

[11]  Philipp Koehn,et al.  Findings of the 2011 Workshop on Statistical Machine Translation , 2011, WMT@EMNLP.

[12]  Hiroshi Ichikawa,et al.  A Lightweight Evaluation Framework for Machine Translation Reordering , 2011, WMT@EMNLP.

[13]  Kevin Duh,et al.  Automatic Evaluation of Translation Quality for Distant Language Pairs , 2010, EMNLP.

[14]  Deniz Yuret,et al.  RegMT System for Machine Translation, System Combination, and Evaluation , 2011, WMT@EMNLP.

[15]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.

[16]  Eleftherios Avramidis,et al.  Evaluation without references: IBM1 scores as evaluation metrics , 2011, WMT@EMNLP.

[17]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[18]  Trevor Cohn,et al.  Regression and Ranking based Optimisation for Sentence Level MT Evaluation , 2011, WMT@EMNLP.

[19]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .