The IWSLT 2015 Evaluation Campaign

The IWSLT 2015 Evaluation Campaign featured three tracks: automatic speech recognition (ASR), spoken language translation (SLT), and machine translation (MT). For ASR we offered two tasks, on English and German, while for SLT and MT a number of tasks were proposed, involving English, German, French, Chinese, Czech, Thai, and Vietnamese. All tracks involved the transcription or translation of TED talks, either made available by the official TED website or by other TEDx events. A notable change with respect to previous evaluations was the use of unsegmented speech in the SLT track in order to better fit a real application scenario. Thus, from one side participants were encouraged to develop advanced methods for sentence segmentation, from the other side organisers had to cope with the automatic evaluation of SLT outputs not matching the sentence-wise arrangement of the human references. A new evaluation server was also developed to allow participants to score their MT and SLT systems on selected dev and test sets. This year 16 teams participated in the evaluation, for a total of 63 primary submissions. All runs were evaluated with objective metrics, and submissions for two of the MT translation tracks were also evaluated with human post-editing.

[1]  Russell V. Lenth,et al.  Computer Intensive Methods for Testing Hypotheses: An Introduction , 1990 .

[2]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[3]  Lynette Hirschman,et al.  Evaluating Message Understanding Systems: An Analysis of the Third Message Understanding Conference (MUC-3) , 1993, CL.

[4]  Noriko Kando,et al.  Overview of the IWSLT04 evaluation campaign , 2004, IWSLT.

[5]  Hermann Ney,et al.  Evaluating Machine Translation Output with Automatic Sentence Segmentation , 2005, IWSLT.

[6]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[7]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[8]  Daniel Jurafsky,et al.  A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005 , 2005, IJCNLP.

[9]  Chiori Hori,et al.  Overview of the IWSLT 2005 Evaluation Campaign , 2005, IWSLT.

[10]  Stefan Riezler,et al.  On Some Pitfalls in Automatic Evaluation and Significance Testing for MT , 2005, IEEvaluation@ACL.

[11]  Michael Paul,et al.  Overview of the IWSLT06 evaluation campaign , 2006, IWSLT.

[12]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[13]  Cameron S. Fordyce,et al.  Overview of the IWSLT 2007 evaluation campaign , 2007, IWSLT.

[14]  Mauro Cettolo,et al.  IRSTLM: an open source toolkit for handling large scale language models , 2008, INTERSPEECH.

[15]  Michael Paul,et al.  Overview of the IWSLT 2008 evaluation campaign. , 2008, IWSLT.

[16]  Michael Paul,et al.  Overview of the IWSLT 2009 evaluation campaign , 2009, IWSLT.

[17]  Hermann Ney,et al.  The RWTH Aachen Machine Translation System for WMT 2010 , 2010, IWSLT.

[18]  Sebastian Stüker,et al.  Overview of the IWSLT 2010 evaluation campaign , 2010, IWSLT.

[19]  Sebastian Stüker,et al.  Overview of the IWSLT 2011 evaluation campaign , 2011, IWSLT.

[20]  Mauro Cettolo,et al.  WIT3: Web Inventory of Transcribed and Translated Talks , 2012, EAMT.

[21]  Sebastian Stüker,et al.  Overview of the IWSLT 2012 evaluation campaign , 2012, IWSLT.

[22]  Marcello Federico Measuring User Productivity in Machine Translation Enhanced Computer Assisted Translation , 2012, AMTA.

[23]  Marcello Federico,et al.  Report on the 10th IWSLT evaluation campaign , 2013, IWSLT.

[24]  Jeffrey Heer,et al.  The efficacy of human post-editing for language translation , 2013, CHI.

[25]  Philipp Koehn,et al.  The MateCat Tool , 2014, COLING.

[26]  Marcello Federico,et al.  Report on the 11th IWSLT evaluation campaign , 2014, IWSLT.

[27]  Alex Waibel,et al.  The 2015 KIT IWSLT speech-to-text systems for English and German , 2015, IWSLT.

[28]  Jan Niehues,et al.  The KIT translation systems for IWSLT 2015 , 2015, IWSLT.

[29]  Christopher D. Manning,et al.  Stanford Neural Machine Translation Systems for Spoken Language Domains , 2015, IWSLT.

[30]  Elizabeth Salesky,et al.  The MITLL-AFRL IWSLT 2014 MT system , 2014, IWSLT.

[31]  Krzysztof Marasek,et al.  PJAIT systems for the IWSLT 2015 evaluation campaign enhanced by comparable corpora , 2015, IWSLT.

[32]  Graham Neubig,et al.  The NAIST English Speech Recognition System for IWSLT 2015 , 2013 .

[33]  Miguel Ángel Del-Agua,et al.  The MLLP ASR systems for IWSLT 2015 , 2015, IWSLT.

[34]  Huy Nguyen,et al.  The IOIT English ASR system for IWSLT 2015 , 2015 .

[35]  Alexandra Birch,et al.  The Edinburgh Machine Translation Systems for IWSLT 2015 , 2015 .

[36]  Tran Huy Dat,et al.  The I 2 R ASR System for IWSLT 2015 , 2015 .

[37]  Paul Deléglise,et al.  The LIUM ASR and SLT systems for IWSLT 2015 , 2015, IWSLT.

[38]  Stefan Riezler,et al.  The Heidelberg University English-German translation system for IWSLT 2015 , 2015, IWSLT.

[39]  Marine Carpuat,et al.  The UMD Machine Translation Systems at IWSLT 2015 , 2015 .

[40]  H. Trieu,et al.  The JAIST-UET-MITI machine translation systems for IWSLT 2015 , 2015, IWSLT.

[41]  Improvement of word alignment models for Vietnamese-to-English translation , 2015, IWSLT.