Human in the Loop Machine Translation of Medical Terminology

Abstract : This memorandum report outlines the Human in the Loop Translation process, which strives to increase translators' productivity using statistical machine translation (SMT). In this process, U.S. Army Research Laboratory (ARL) systems trained by open source SMT toolkits were placed in a feedback loop with human translators in Afghanistan to incrementally improve Dari translations of English medical training manuals and to create bilingual glossaries in the medical domain. Anecdotal evidence indicates that the quality of the machine translation drafts produced using this feedback loop process is high enough to assist human translators in translating training manuals in the medical domain. Automatic scores showed large improvements in translation quality as the loop progressed for a very narrow task, providing indirect evidence that SMT can benefit human translators even with small amounts of training data.

[1]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[2]  Omar Zaidan,et al.  Z-MERT: A Fully Configurable Open Source Tool for Minimum Error Rate Training of Machine Translation Systems , 2009, Prague Bull. Math. Linguistics.

[3]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[4]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[5]  Ben Taskar,et al.  Alignment by Agreement , 2006, NAACL.

[6]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[7]  Ben Taskar,et al.  PostCAT - Posterior Constrained Alignment Toolkit , 2009, Prague Bull. Math. Linguistics.

[8]  Nitin Madnani,et al.  The Hiero Machine Translation System: Extensions, Evaluation, and Analysis , 2005, HLT.

[9]  Wolfgang Macherey,et al.  Lattice-based Minimum Error Rate Training for Statistical Machine Translation , 2008, EMNLP.

[10]  Pierre Zweigenbaum,et al.  Translating medical terminologies through word alignment in parallel text corpora , 2009, J. Biomed. Informatics.

[11]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[12]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[13]  Chris Callison-Burch,et al.  Demonstration of Joshua: An Open Source Toolkit for Parsing-based Machine Translation , 2009, ACL.

[14]  Chris Callison-Burch,et al.  Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Lattice Decoding , 2006 .

[15]  Adam David Lopez,et al.  Machine Translation by Pattern Matching , 2008 .

[16]  Robert C. Moore Fast and accurate sentence alignment of bilingual corpora , 2002, AMTA.