Spoken dialogue translation systems evaluation: results, new trends, problems and proposals

It is important to evaluate Spoken Dialogue Translation Systems, but as we show by analyzing evaluation methods in the Verbmobil, C-STAR II, and the Nespole! projects, the current state of the art is not fully satisfactory. Subjective methods are too costly, and objective methods, although cheaper, don’t give good indications about usability. We propose some ideas to improve that situation.

[1]  Keh-Yih Su,et al.  A New Quantitative Quality Measure for Machine Translation Systems , 1992, COLING.

[2]  Deborah A. Coughlin,et al.  Correlating automated and human assessments of machine translation quality , 2003, MTSUMMIT.

[3]  Eiichiro Sumita,et al.  Using multiple edit distances to automatically rank machine translation output , 2001, MTSUMMIT.

[4]  Joseph P. Turian,et al.  Evaluation of machine translation and its evaluation , 2003, MTSUMMIT.

[5]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[6]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[7]  Hervé Blanchon,et al.  Interchange Format-based Language Model for Automatic Speech Recognition in Speech-to-Speech Translation , 2004, RIVF.

[8]  Gianni Lazzari Spoken translation: challenges and opportunities , 2000, INTERSPEECH.

[9]  Christian Boitet,et al.  Speech translation for French within the c-STAR II consortium and future perspectives , 2000, INTERSPEECH.

[10]  Christian Boitet,et al.  Towards fairer evaluations of commercial MT systems on basic travel expressions corpora , 2004, IWSLT.

[11]  Walther von Hahn,et al.  Functional Validation of a Machine Interpretation System: Verbmobil , 2000 .

[12]  D. W. Barron Machine Translation , 1968, Nature.

[13]  Wolfgang Wahlster,et al.  Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.

[14]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[15]  Chin-Yew Lin,et al.  Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics , 2004, ACL.

[16]  Christian BOITET,et al.  A way to integrate context processing in the MT component of spoken, task-oriented translation systems , 2000 .

[17]  Eiichiro Sumita,et al.  Toward a Broad-coverage Bilingual Corpus for Speech Translation of Travel Conversations in the Real World , 2002, LREC.

[18]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[19]  Robert A. Wagner,et al.  An Extension of the String-to-String Correction Problem , 1975, JACM.

[20]  Chin-Yew Lin,et al.  ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation , 2004, COLING.

[21]  Margaret King,et al.  Evaluating natural language processing systems , 1996, CACM.

[22]  I. Dan Melamed,et al.  Precision and Recall of Machine Translation , 2003, NAACL.

[23]  Andrei Popescu-Belis,et al.  Principles of Context-Based Machine Translation Evaluation , 2002, Machine Translation.

[24]  Hermann Ney,et al.  An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research , 2000, LREC.

[25]  Nadia Mana,et al.  The NESPOLE! voIP multilingual corpora in tourism and medical domains , 2003, INTERSPEECH.

[26]  Yoshinori Sagisaka,et al.  Evaluation of the ATR-matrix speech translation system with a pair comparison method between the system and humans , 2000, INTERSPEECH.