UoS Participation in the WMT20 Translation of Biomedical Abstracts

This paper describes the machine translation systems developed by the University of Sheffield (UoS) team for the biomedical translation shared task of WMT20. Our system is based on a Transformer model with TensorFlow Model Garden toolkit. We participated in ten translation directions for the English/Spanish, English/Portuguese, English/Russian, English/Italian, and English/French language pairs. To create our training data, we concatenated several parallel corpora, both from in-domain and out-ofdomain sources.

[1]  François Yvon,et al.  LIMSI’s Contribution to the WMT’16 Biomedical Translation Task , 2016, WMT.

[2]  Marta R. Costa-jussà,et al.  Terminology-Aware Segmentation and Domain Feature for the WMT19 Biomedical Translation Task , 2019, WMT.

[3]  Mariana L. Neves,et al.  The Scielo Corpus: a Parallel Corpus of Scientific Publications for Biomedicine , 2016, LREC.

[4]  Felipe Soares,et al.  UFRGS Participation on the WMT Biomedical Translation Shared Task , 2019, WMT.

[5]  K. Bretonnel Cohen,et al.  Findings of the WMT 2019 Biomedical Translation Shared Task: Evaluation for MEDLINE Abstracts and Biomedical Terminologies , 2019, WMT.

[6]  Felipe Soares,et al.  A Parallel Corpus of Theses and Dissertations Abstracts , 2018, PROPOR.

[7]  Bill Byrne,et al.  UCAM Biomedical Translation at WMT19: Transfer Learning Multi-domain Ensembles , 2019, WMT.

[8]  Anna Zaretskaya,et al.  ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts , 2020, LREC.

[9]  Martin Krallinger,et al.  BSC Participation in the WMT Translation of Biomedical Abstracts , 2019, WMT.

[10]  Felipe Soares,et al.  A Large Parallel Corpus of Full-Text Scientific Articles , 2018, LREC.

[11]  Marta R. Costa-jussà,et al.  Neural Machine Translation with the Transformer and Multi-Source Romance Languages for the Biomedical WMT 2018 task , 2018, WMT.

[12]  Karin M. Verspoor,et al.  Findings of the WMT 2018 Biomedical Translation Shared Task: Evaluation on Medline test sets , 2018, WMT.

[13]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[14]  Marta R. Costa-jussà,et al.  The TALP-UPC Spanish-English WMT Biomedical Task: Bilingual Embeddings and Char-based Neural Language Model Rescoring in a Phrase-based System , 2016, WMT.

[15]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[16]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.