Post-édition statistique pour l’adaptation aux domaines de spécialité en traduction automatique (Statistical Post-Editing of Machine Translation for Domain Adaptation) [in French]

Statistical Post-Editing of Machine Translation for Domain Adaptation This paper presents a statistical approach to adapt generic machine translation systems to the medical domain through an unsupervised post-edition step. A statistical post-edition model is built on statistical machine translation outputs aligned with their translation references. Evaluations carried out to translate medical texts from French to English show that a generic machine translation system can be adapted a posteriori to a specific domain. Two systems are studied : a state-of-the-art phrase-based implementation and an online publicly available software. Our experiments also indicate that selecting sentences for post-edition leads to significant improvements of translation quality and that more gains are still possible with respect to an oracle measure.

[1]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[2]  Philipp Koehn,et al.  Statistical Post-Editing on SYSTRAN‘s Rule-Based Translation System , 2007, WMT@ACL.

[3]  Cyril Goutte,et al.  Domain adaptation of MT systems through automatic post-editing , 2007, MTSUMMIT.

[4]  Roland Kuhn,et al.  Rule-Based Translation with Statistical Phrase-Based Post-Editing , 2007, WMT@ACL.

[5]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[6]  Andy Way,et al.  A cluster-based representation for multi-system MT evaluation , 2007 .

[7]  Hirokazu Suzuki,et al.  Automatic Post-Editing based on SMT and its selective application by Sentence-Level Automatic Quality Evaluation , 2011, MTSUMMIT.

[8]  Josef van Genabith,et al.  Statistical Post-Editing for a Statistical MT System , 2011, MTSUMMIT.

[9]  Juan C Sager,et al.  English Special Languages: Principles and Practice in Science and Technology , 1980 .

[10]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[11]  Michel Simard,et al.  Statistical Phrase-Based Post-Editing , 2007, NAACL.

[12]  Gorka Labaka,et al.  Statistical Post-Editing : A Valuable Method in Domain Adaptation of RBMT Systems for Less-Resourced Languages , 2008 .

[13]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[14]  Kemal Oflazer,et al.  Exploring Different Representational Units in English-to-Turkish Statistical Machine Translation , 2007, WMT@ACL.

[15]  Jörg Tiedemann,et al.  News from OPUS — A collection of multilingual parallel corpora with tools and interfaces , 2009 .

[16]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[17]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[18]  Stephan Vogel,et al.  Parallel Implementations of Word Alignment Tool , 2008, SETQALNLP.