Neural-based machine translation for medical text domain. Based on European Medicines Agency leaflet texts

Abstract The quality of machine translation is rapidly evolving. Today one can find several machine translation systems on the web that provide reasonable translations, although the systems are not perfect. In some specific domains, the quality may decrease. A recently proposed approach to this domain is neural machine translation. It aims at building a jointly-tuned single neural network that maximizes translation performance, a very different approach from traditional statistical machine translation. Recently proposed neural machine translation models often belong to the encoder-decoder family in which a source sentence is encoded into a fixed length vector that is, in turn, decoded to generate a translation. The present research examines the effects of different training methods on a Polish-English Machine Translation system used for medical data. The European Medicines Agency parallel text corpus was used as the basis for training of neural and statistical network-based translation systems. The main machine translation evaluation metrics have also been used in analysis of the systems. A comparison and implementation of a real-time medical translator is the main focus of our experiments.

[1]  Krzysztof Marasek,et al.  Polish-English speech statistical machine translation systems for the IWSLT 2014 , 2015, IWSLT.

[2]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[3]  John Peter Jesan,et al.  Human brain and neural network behavior: a comparison , 2003, UBIQ.

[4]  Ondrej Dusek,et al.  Machine Translation of Medical Texts in the Khresmoi Project , 2014, WMT@ACL.

[5]  Philipp Koehn,et al.  What is a Better Translation? Reflections on Six Years of Running Evaluation Campaigns. , 2014 .

[6]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[7]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[8]  K. Pottie,et al.  Using machine translation in clinical practice. , 2013, Canadian family physician Medecin de famille canadien.

[9]  L. Karliner,et al.  Do professional interpreters improve clinical care for patients with limited English proficiency? A systematic review of the literature. , 2007, Health services research.

[10]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[11]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[12]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[13]  Raúl Rojas,et al.  Neural Networks - A Systematic Introduction , 1996 .

[14]  Jörg Tiedemann,et al.  News from OPUS — A collection of multilingual parallel corpora with tools and interfaces , 2009 .

[15]  Adam Radziszewski A Tiered CRF Tagger for Polish , 2013, Intelligent Tools for Building a Scientific Information Platform.

[16]  Amittai Axelrod,et al.  Application of statistical machine translation to public health information: a feasibility study , 2011, J. Am. Medical Informatics Assoc..

[17]  Chris Callison-Burch,et al.  Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Lattice Decoding , 2006 .

[18]  Katja Bachmeier Intelligent Tools for Building a Scientific Information Platform - Advanced Architectures and Solutions , 2013, Intelligent Tools for Building a Scientific Information Platform.

[19]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[20]  Krzysztof Marasek,et al.  Enhanced Bilingual Evaluation Understudy , 2015, ArXiv.

[21]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[22]  Alon Lavie,et al.  Evaluating the Output of Machine Translation Systems , 2010, AMTA.

[23]  Tomas Mikolov,et al.  RNNLM - Recurrent Neural Network Language Modeling Toolkit , 2011 .

[24]  Amittai Axelrod,et al.  Factored Language Models for Statistical Machine Translation , 2006 .

[25]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[26]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[27]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .