Automated metrics for speech translation

In this paper, we describe automated measures used to evaluate machine translation quality in the Defense Advanced Research Projects Agency's Spoken Language Communication and Translation System for Tactical Use program, which is developing speech translation systems for dialogue between English and Iraqi Arabic speakers in military contexts. Limitations of the automated measures are illustrated along with variants of the measures that seek to overcome those limitations. Both the dialogue structure of the data and the Iraqi Arabic language challenge these measures, and the paper presents some solutions adopted by MITRE and NIST to improve confidence in the scores.

[1]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[2]  Pradeep Ravikumar,et al.  A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.

[3]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[4]  Andy Way,et al.  Dependency-Based Automatic Evaluation for Machine Translation , 2007, SSST@HLT-NAACL.

[5]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[6]  Niladri Chatterjee,et al.  Some Improvements over the BLEU Metric for Measuring Translation Quality for Hindi , 2007, 2007 International Conference on Computing: Theory and Applications (ICCTA'07).

[7]  Alon Lavie,et al.  BLANC: Learning Evaluation Metrics for MT , 2005, HLT.

[8]  Brian A. Weiss,et al.  Performance Evaluation of Speech Translation Systems , 2008, LREC.

[9]  Rohit Prasad,et al.  Semantic translation error rate for evaluating translation systems , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[10]  Lisa Ballesteros,et al.  Light Stemming for Arabic Information Retrieval , 2007 .

[11]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[12]  Joseph P. Turian,et al.  Evaluation of machine translation and its evaluation , 2003, MTSUMMIT.

[13]  Gregory A. Sanders,et al.  Odds of Successful Transfer of Low-Level Concepts: a Key Metric for Bidirectional Speech-to-Speech Machine Translation in DARPA’s TRANSTAC Program , 2008, LREC.