Findings of the WMT 2020 Biomedical Translation Shared Task: Basque, Italian and Russian as New Additional Languages

Machine translation of scientific abstracts and terminologies has the potential to support health professionals and biomedical researchers in some of their activities. In the fifth edition of the WMT Biomedical Task, we addressed a total of eight language pairs. Five language pairs were previously addressed in past editions of the shared task, namely, English/German, English/French, English/Spanish, English/Portuguese, and English/Chinese. Three additional languages pairs were also introduced this year: English/Russian, English/Italian, and English/Basque. The task addressed the evaluation of both scientific abstracts (all language pairs) and terminologies (English/Basque only). We received submissions from a total of 20 teams. For recurring language pairs, we observed an improvement in the translations in terms of automatic scores and qualitative evaluations, compared to previous years. ∗ The author list is alphabetical and does not reflect the respective author contributions.

[1]  Jörg Tiedemann,et al.  OPUS-MT – Building open translation services for the World , 2020, EAMT.

[2]  Felipe Soares,et al.  UoS Participation in the WMT20 Translation of Biomedical Abstracts , 2020, WMT.

[3]  K. Bretonnel Cohen,et al.  Findings of the WMT 2019 Biomedical Translation Shared Task: Evaluation for MEDLINE Abstracts and Biomedical Terminologies , 2019, WMT.

[4]  Anna Zaretskaya,et al.  ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts , 2020, LREC.

[5]  Myle Ott,et al.  Facebook FAIR’s WMT19 News Translation Task Submission , 2019, WMT.

[6]  Nitika Mathur,et al.  Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics , 2020, ACL.

[7]  Stéfan Jacques Darmoni,et al.  [LiSSa, health scientific literature: a French bibliographic database]. , 2017, La Revue du praticien.

[8]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[9]  Karin M. Verspoor,et al.  Findings of the 2016 Conference on Machine Translation , 2016, WMT.

[10]  Christian Federmann,et al.  Appraise: An Open-Source Toolkit for Manual Phrase-Based Evaluation of Translations , 2010, LREC.

[11]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[12]  Rejwanul Haque,et al.  The ADAPT's Submissions to the WMT20 Biomedical Translation Task , 2020, WMT@EMNLP.

[13]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[14]  Xabier Saralegi,et al.  Elhuyar submission to the Biomedical Translation Task 2020 on terminology and abstracts translation , 2020, WMT.

[15]  Rodrigo Nogueira,et al.  Lite Training Strategies for Portuguese-English and English-Portuguese Translation , 2020, WMT.

[16]  Massimo Piccardi,et al.  English-Basque Statistical and Neural Machine Translation , 2018, LREC.

[17]  Karin M. Verspoor,et al.  Findings of the WMT 2018 Biomedical Translation Shared Task: Evaluation on Medline test sets , 2018, WMT.

[18]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[19]  Shuming Shi,et al.  Tencent AI Lab Machine Translation Systems for the WMT20 Biomedical Translation Task , 2020, WMT.

[20]  Gorka Labaka,et al.  Ixamed's submission description for WMT20 Biomedical shared task: benefits and limitations of using terminologies for domain adaptation , 2020, WMT@EMNLP.

[21]  Felipe Soares,et al.  A Large Parallel Corpus of Full-Text Scientific Articles , 2018, LREC.

[22]  Mariana L. Neves,et al.  The Scielo Corpus: a Parallel Corpus of Scientific Publications for Biomedicine , 2016, LREC.

[23]  François Yvon,et al.  LIMSI @ WMT 2020 , 2020, WMT.

[24]  André F. T. Martins,et al.  Marian: Fast Neural Machine Translation in C++ , 2018, ACL.

[25]  Karin M. Verspoor,et al.  Findings of the WMT 2017 Biomedical Translation Shared Task , 2017, WMT.

[26]  Maja Popovic,et al.  chrF: character n-gram F-score for automatic MT evaluation , 2015, WMT@EMNLP.

[27]  Karin M. Verspoor,et al.  Parallel Corpora for the Biomedical Domain , 2018, LREC.

[28]  Karen Hambardzumyan,et al.  YerevaNN’s Systems for WMT20 Biomedical Translation Task: The Effect of Fixing Misaligned Sentence Pairs , 2020, WMT.

[29]  Massimo Piccardi,et al.  Pretrained Language Models and Backtranslation for English-Basque Biomedical Neural Machine Translation , 2020, WMT@EMNLP.

[30]  Pavel Pecina,et al.  Khresmoi Summary Translation Test Data 1.1 , 2014 .

[31]  Maite Oronoz,et al.  SNOMED CT in a language isolate: an algorithm for a semiautomatic translation , 2015, BMC Medical Informatics and Decision Making.

[32]  Philipp Koehn,et al.  Translationese in Machine Translation Evaluation , 2019, EMNLP.

[33]  Sadaf Abdul-Rauf,et al.  FJWU participation for the WMT20 Biomedical Translation Task , 2020, WMT.

[34]  Bill Byrne,et al.  Addressing Exposure Bias With Document Minimum Risk Training: Cambridge at the WMT20 Biomedical Translation Task , 2020, WMT.

[35]  Jianfeng Liu,et al.  Huawei’s Submissions to the WMT20 Biomedical Translation Task , 2020, WMT.