Overview of the IWSLT 2011 evaluation campaign

We report here on the eighth Evaluation Campaign organized by the IWSLT workshop. This year, the IWSLT evaluation focused on the automatic translation of public talks and included tracks for speech recognition, speech translation, text translation, and system combination. Unlike previous years, all data supplied for the evaluation has been publicly released on the workshop website, and is at the disposal of researchers interested in working on our benchmarks and in comparing their results with those published at the workshop. This paper provides an overview of the IWSLT 2011 Evaluation Campaign, which includes: descriptions of the supplied data and evaluation specifications of each track, the list of participants specifying their submitted runs, a detailed description of the subjective evaluation carried out, the main findings of each exercise drawn from the results and the system descriptions prepared by the participants, and, finally, several detailed tables reporting all the evaluation results.

[1]  Bernice W. Polemis Nonparametric Statistics for the Behavioral Sciences , 1959 .

[2]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[3]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[4]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[5]  Hermann Ney,et al.  An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research , 2000, LREC.

[6]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[7]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[8]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[9]  Joseph P. Turian,et al.  Evaluation of machine translation and its evaluation , 2003, MTSUMMIT.

[10]  Ying Zhang,et al.  Interpreting BLEU/NIST Scores: How Much Improvement do We Need to Have a Better System? , 2004, LREC.

[11]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[12]  Andy Way,et al.  The DCU machine translation system , 2006 .

[13]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[14]  Timothy R. Anderson,et al.  The MIT-LL/AFRL IWSLT-2006 MT system , 2006, IWSLT.

[15]  Philipp Koehn,et al.  (Meta-) Evaluation of Machine Translation , 2007, WMT@ACL.

[16]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.

[17]  José B. Mariño,et al.  System Combination for Machine Translation of Spoken and Written Language , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Chris Callison-Burch,et al.  Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk , 2009, EMNLP.

[19]  Hermann Ney,et al.  The RWTH Aachen Machine Translation System for WMT 2010 , 2010, IWSLT.

[20]  Jan Niehues,et al.  The KIT English-French translation systems for IWSLT 2011 , 2011, IWSLT.

[21]  Marcello Federico,et al.  Getting Expert Quality from the Crowd for Machine Translation Evaluation , 2011, MTSUMMIT.

[22]  Arianna Bisazza,et al.  FBK@IWSLT 2011 , 2011, IWSLT.

[23]  Mei-Yuh Hwang,et al.  The MSR SYSTEM for IWSLT 2011 evaluation , 2011, IWSLT.

[24]  Benjamin Lecouteux,et al.  LIG English-French spoken language translation system for IWSLT 2011 , 2011, IWSLT.

[25]  Paul Deléglise,et al.  LIUM's systems for the IWSLT 2011 speech translation tasks , 2011, IWSLT.

[26]  Eleftherios Avramidis,et al.  DFKI’s SC and MT submissions to IWSLT 2011 , 2011, IWSLT.

[27]  Hideki Kashioka,et al.  The NICT ASR system for IWSLT2011 , 2011, IWSLT.

[28]  Alexandre Allauzen,et al.  LIMSI’s experiments in domain adaptation for IWSLT11 , 2011, IWSLT.

[29]  Eiichiro Sumita,et al.  The NICT translation system for IWSLT 2012 , 2012, IWSLT.