LIMSI @ WMT 2020

This paper describes LIMSI's submissions to the translation shared tasks at WMT'20. This year we have focused our efforts on the biomedical translation task, developing a resource-heavy system for the translation of medical abstracts from English into French, using back-translated texts, terminological resources as well as multiple pre-processing pipelines, including pre-trained representations. Systems were also prepared for the robustness task for translating from English into German; for this large-scale task we developed multi-domain, noise-robust, translation systems aim to handle the two test conditions: zero-shot and few-shot domain adaptation.

[1]  Qun Liu,et al.  Huawei’s NMT Systems for the WMT 2019 Biomedical Translation Task , 2019, WMT.

[2]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[3]  Ankur Bapna,et al.  Simple, Scalable Adaptation for Neural Machine Translation , 2019, EMNLP.

[4]  François Yvon,et al.  Diagnosing High-Quality Statistical Machine Translation Using Traces of Post-Edition Operations , 2016, LREC 2016.

[5]  Marcin Junczys-Dowmunt,et al.  The WikEd Error Corpus: A Corpus of Corrective Wikipedia Edits and Its Application to Grammatical Error Correction , 2014, PolTAL.

[6]  Karin M. Verspoor,et al.  Findings of the WMT 2017 Biomedical Translation Shared Task , 2017, WMT.

[7]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[8]  Mariana L. Neves,et al.  The Scielo Corpus: a Parallel Corpus of Scientific Publications for Biomedicine , 2016, LREC.

[9]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[10]  Yue Zhang,et al.  Code-Switching for Enhancing NMT with Pre-Specified Translation , 2019, NAACL.

[11]  Alexander M. Fraser,et al.  The LMU Munich Unsupervised Machine Translation System for WMT19 , 2019, WMT.

[12]  Marta R. Costa-jussà,et al.  Terminology-Aware Segmentation and Domain Feature for the WMT19 Biomedical Translation Task , 2019, WMT.

[13]  Raheel Nawaz,et al.  Exploring Transfer Learning and Domain Data Selection for the Biomedical Translation , 2019, WMT.

[14]  Bill Byrne,et al.  UCAM Biomedical Translation at WMT19: Transfer Learning Multi-domain Ensembles , 2019, WMT.

[15]  Martin Krallinger,et al.  BSC Participation in the WMT Translation of Biomedical Abstracts , 2019, WMT.

[16]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[17]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[18]  Jia Xu,et al.  Hunter NMT System for WMT18 Biomedical Translation Task: Transfer Learning in Neural Machine Translation , 2018, WMT.

[19]  Holger Schwenk,et al.  On the Use of Comparable Corpora to Improve SMT performance , 2009, EACL.

[20]  Guillaume Wisniewski,et al.  Comparison between NMT and PBSMT Performance for Translating Noisy User-Generated Content , 2019, NODALIDA.

[21]  Robert C. Moore Fast and accurate sentence alignment of bilingual corpora , 2002, AMTA.

[22]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[23]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[24]  Yaser Al-Onaizan,et al.  Training Neural Machine Translation to Apply Terminology Constraints , 2019, ACL.

[25]  François Maniez L'adjectif dénominal en langue de spécialité : étude du domaine de la médecine. , 2009 .

[26]  Tie-Yan Liu,et al.  Incorporating BERT into Neural Machine Translation , 2020, ICLR.

[27]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[28]  François Yvon,et al.  Using Monolingual Data in Neural Machine Translation: a Systematic Study , 2018, WMT.

[29]  Benoît Sagot,et al.  From Noisy Questions to Minecraft Texts: Annotation Challenges in Extreme Syntax Scenario , 2016, NUT@COLING.