The University of Helsinki Submissions to the WMT19 News Translation Task

In this paper, we present the University of Helsinki submissions to the WMT 2019 shared task on news translation in three language pairs: English-German, English-Finnish and Finnish-English. This year, we focused first on cleaning and filtering the training data using multiple data-filtering approaches, resulting in much smaller and cleaner training sets. For English-German, we trained both sentence-level transformer models and compared different document-level translation approaches. For Finnish-English and English-Finnish we focused on different segmentation approaches, and we also included a rule-based system for English-Finnish.

[1]  James Henderson,et al.  Document-Level Neural Machine Translation with Hierarchical Attention Networks , 2018, EMNLP.

[2]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[3]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[4]  Arvi Hurskainen Optimizing Rules in English to Finnish Machine Translation , 2018 .

[5]  Timothy Baldwin,et al.  langid.py: An Off-the-shelf Language Identification Tool , 2012, ACL.

[6]  Tommi A. Pirinen,et al.  Omorfi — Free and open source morphological lexical database for Finnish , 2015, NODALIDA.

[7]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[8]  Jörg Tiedemann,et al.  The Helsinki Neural Machine Translation System , 2017, WMT.

[9]  Teemu Hirsimäki,et al.  On Growing and Pruning Kneser–Ney Smoothed $ N$-Gram Models , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Jörg Tiedemann,et al.  Neural Machine Translation with Extended Context , 2017, DiscoMT@EMNLP.

[11]  Mikko Kurimo,et al.  Morfessor and variKN machine learning tools for speech and language technology , 2007, INTERSPEECH.

[12]  Arvi Hurskainen Direct and indirect questions in English to Finnish machine translation , 2018 .

[13]  Adrià de Gispert,et al.  The University of Cambridge’s Machine Translation Systems for WMT18 , 2018, WMT.

[14]  Arvi Hurskainen Compound Nouns in English to Finnish Machine Translation , 2018 .

[15]  Arvi Hurskainen Participial Phrases in English to Finnish Machine Translation , 2018 .

[16]  Jörg Tiedemann,et al.  Efficient Word Alignment with Markov Chain Monte Carlo , 2016, Prague Bull. Math. Linguistics.

[17]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[18]  André F. T. Martins,et al.  Marian: Fast Neural Machine Translation in C++ , 2018, ACL.

[19]  Mikko Kurimo,et al.  Cognate-aware morphological segmentation for multilingual neural translation , 2018, WMT.

[20]  Jörg Tiedemann,et al.  The University of Helsinki submissions to the WMT18 news task , 2018, WMT.

[21]  Gholamreza Haffari,et al.  Selective Attention for Context-aware Neural Machine Translation , 2019, NAACL.

[22]  Karin M. Verspoor,et al.  The University of Helsinki submissions to the WMT 18 news task , .

[23]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.