Application and Analysis of Sentence Similarity Based Machine Translation Evaluation

To help developing a localization oriented example based machine translation (EBMT) system, an automatic machine translation evaluation method is implemented which adopts edit similarity, cosine correlation and Dice coefficient as criteria. Experiment shows that the evaluation method distinguishes well between translations of different intelligibility and fluency. The similarity between Dice coefficient and cosine are analyzed mathematically and observed in the experiments. To verify theconsistency between automatic and human evaluation methods, six machine translation systems are scored using both human and automatic methods. The evaluation results are compared which show consistency between different evaluation methods. Statistical analysis is made to validate the experimental results. Correlation coefficient and significance tests at 99%level are made to ensure the reliability of the results. Linear regression equations are built to map the automatic scoring results to human scorings. The regression equation is utilized to predict human scoring of machine translation systems. The prediction result is promising. Experimental results show that the proposed MT evaluation method is applicable to general MT systems and EBMT as well.

[1]  Douglas A. Jones,et al.  Toward a Scoring Function for Quality-Driven Machine Translation , 2000, COLING.

[2]  Eiichiro Sumita,et al.  Using multiple edit distances to automatically rank machine translation output , 2001, MTSUMMIT.

[3]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[4]  Andrei Popescu-Belis,et al.  An Introduction to MT Evaluation , 2006 .

[5]  Martin Volk,et al.  Evaluating Translation Quality as Input to Product Development , 2000, LREC.

[6]  Yuji Matsumoto,et al.  Feedback Cleaning of Machine Translation Rules Using Automatic Evaluation , 2003, ACL.

[7]  Tiejun Zhao,et al.  An Automatic Evaluation Method for Localization Oriented Lexicalised EBMT System , 2002, COLING.

[8]  Sungryong Koh,et al.  A test suite for evaluation of English-to-Korean machine translation systems , 2001 .

[9]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[10]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[11]  A. Guessoum,et al.  Semi-automatic evaluation of the grammatical coverage of machine translation systems , 2001, MTSUMMIT.

[12]  John S. White,et al.  The ARPA MT Evaluation Methodologies: Evolution, Lessons, and Future Approaches , 1994, AMTA.

[13]  Chris Brew,et al.  Automatic Evaluation of Computer Generated Text: A Progress Report on the TextEval Project , 1994, HLT.

[14]  Yu Shiwen Automatic evaluation of output quality for Machine Translation systems , 1993 .

[15]  Toshiyuki Takezawa,et al.  An automatic evaluation method of translation quality using translation answer candidates queried from a parallel corpus , 2001 .