English-Catalan Neural Machine Translation in the Biomedical Domain through the cascade approach

This paper describes the methodology followed to build a neural machine translation system in the biomedical domain for the English-Catalan language pair. This task can be considered a low-resourced task from the point of view of the domain and the language pair. To face this task, this paper reports experiments on a cascade pivot strategy through Spanish for the neural machine translation using the English-Spanish SCIELO and Spanish-Catalan El Peri\'odico database. To test the final performance of the system, we have created a new test data set for English-Catalan in the biomedical domain which is freely available on request.

[1]  Arianna Bisazza,et al.  Neural versus phrase-based MT quality: An in-depth analysis on English-German and English-French , 2018, Comput. Speech Lang..

[2]  Noah A. Smith,et al.  Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2016, ACL 2016.

[3]  José B. Mariño,et al.  A Large Spanish-Catalan Parallel Corpus Release for Machine Translation , 2014, Comput. Informatics.

[4]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[5]  Eneko Agirre,et al.  Unsupervised Neural Machine Translation , 2017, ICLR.

[6]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[7]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[8]  Marta R. Costa-jussà,et al.  The TALP-UPC Spanish-English WMT Biomedical Task: Bilingual Embeddings and Char-based Neural Language Model Rescoring in a Phrase-based System , 2016, WMT.

[9]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[10]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[11]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[12]  Yang Liu,et al.  Joint Training for Pivot-based Neural Machine Translation , 2016, IJCAI.

[13]  Marta R. Costa-jussà,et al.  Domain adaptation strategies in statistical machine translation: a brief overview , 2015, The Knowledge Engineering Review.

[14]  Guillaume Lample,et al.  Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.

[15]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[16]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[17]  A. Gispert,et al.  Catalan-English Statistical Machine Translation without Parallel Corpus : Bridging through Spanish , 2006 .

[18]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[19]  Chenhui Chu,et al.  An Empirical Comparison of Simple Domain Adaptation Methods for Neural Machine Translation , 2017, ArXiv.

[20]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.