DiaBLa: a corpus of bilingual spontaneous written dialogues for machine translation

We present a new English-French test set for the evaluation of Machine Translation (MT) for informal, written bilingual dialogue. The test set contains 144 spontaneous dialogues (5,700+ sentences) between native English and French speakers, mediated by one of two neural MT systems in a range of role-play settings. The dialogues are accompanied by fine-grained sentence-level judgments of MT quality, produced by the dialogue participants themselves, as well as by manually normalised versions and reference translations produced a posteriori. The motivation for the corpus is two-fold: to provide (i) a unique resource for evaluating MT models, and (ii) a corpus for the analysis of MT-mediated communication. We provide a preliminary analysis of the corpus to confirm that the participants' judgments reveal perceptible differences in MT quality between the two MT systems used.

[1]  Rico Sennrich,et al.  Evaluating Discourse Phenomena in Neural Machine Translation , 2017, NAACL.

[2]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[3]  Jörg Tiedemann,et al.  OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles , 2016, LREC.

[4]  Margaret King,et al.  Using Test Suites in Evaluation of Machine Translation Systems , 1990, COLING.

[5]  R. Fisher 019: On the Interpretation of x2 from Contingency Tables, and the Calculation of P. , 1922 .

[6]  André F. T. Martins,et al.  Marian: Fast Neural Machine Translation in C++ , 2018, ACL.

[7]  Hitoshi Iida,et al.  A speech and language database for speech translation research , 1994, ICSLP.

[8]  Christian Federmann,et al.  Microsoft Speech Language Translation (MSLT) Corpus: The IWSLT 2016 release for English, French and German , 2016, IWSLT.

[9]  Wolfgang Wahlster,et al.  Mobile Speech-to-Speech Translation of Spontaneous Dialogs: An Overview of the Final Verbmobil System , 2000 .

[10]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[11]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[12]  Eiichiro Sumita,et al.  Multilingual Spoken Language Corpus Development for Communication Research , 2006, ROCLING/IJCLCLP.

[13]  Rico Sennrich,et al.  Controlling Politeness in Neural Machine Translation via Side Constraints , 2016, NAACL.

[14]  Jörg Tiedemann,et al.  OpenSubtitles2018: Statistical Rescoring of Sentence Alignments in Large, Noisy Parallel Corpora , 2018, LREC.

[15]  Rico Sennrich,et al.  Nematus: a Toolkit for Neural Machine Translation , 2017, EACL.

[16]  R. Fisher On the Interpretation of χ2 from Contingency Tables, and the Calculation of P , 2010 .

[17]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[18]  Eiichiro Sumita,et al.  Toward a Broad-coverage Bilingual Corpus for Speech Translation of Travel Conversations in the Real World , 2002, LREC.

[19]  Yuka Kobayashi,et al.  The dialogue breakdown detection challenge: Task description, datasets, and evaluation metrics , 2016, LREC.

[20]  Jörg Tiedemann,et al.  Neural Machine Translation with Extended Context , 2017, DiscoMT@EMNLP.

[21]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[22]  Pierre Isabelle,et al.  A Challenge Set Approach to Evaluating Machine Translation , 2017, EMNLP.