The System Combination RWTH Aachen: SYSTRAN for the NTCIR-10 PatentMT Evaluation

This paper describes the joint submission by RWTH Aachen University and SYSTRAN in the Chinese-English Patent Machine Translation Task at the 10th NTCIR Workshop. We specify the statistical systems developed by RWTH Aachen University and the hybrid machine translation systems developed by SYSTRAN. We apply RWTH Aachen’s combination techniques to create consensus hypotheses from very different systems: phrase-based and hierarchical SMT, rule-based MT (RBMT) and MT with statistical post-editing (SPE). The system combination was ranked second in BLEU and second in the human adequacy evaluation in this competition.

[1]  Chris Callison-Burch,et al.  Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Lattice Decoding , 2006 .

[2]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[3]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[4]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[5]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[6]  Johan Schalkwyk,et al.  OpenFst: A General and Efficient Weighted Finite-State Transducer Library , 2007, CIAA.

[7]  Roland Kuhn,et al.  Rule-Based Translation with Statistical Phrase-Based Post-Editing , 2007, WMT@ACL.

[8]  Eiichiro Sumita,et al.  Overview of the Patent Machine Translation Task at the NTCIR-10 Workshop , 2011, NTCIR.

[9]  Proceedings of the 10th NTCIR Conference on Evaluation of Information Access Technologies, NTCIR-10, National Center of Sciences, Tokyo, Japan, June 18-21, 2013 , 2013, NTCIR.

[10]  Rémi Zajac,et al.  SYSTRAN's Chinese Word Segmentation , 2003, SIGHAN.

[11]  Philipp Koehn,et al.  Selective addition of corpus-extracted phrasal lexical rules to a rule-based machine translation system , 2009, MTSUMMIT.

[12]  Robert L. Mercer,et al.  Word-Sense Disambiguation Using Statistical Methods , 1991, ACL.

[13]  Jin Yang,et al.  SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 , 2011 .

[14]  Markus Freitag,et al.  The RWTH Aachen System for NTCIR-10 PatentMT , 2013, NTCIR.

[15]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[16]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[17]  Michel Simard,et al.  NRC‘s PORTAGE System for WMT 2007 , 2007, WMT@ACL.

[18]  Alon Lavie,et al.  Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems , 2011, WMT@EMNLP.

[19]  Wolfgang Macherey,et al.  Lattice-based Minimum Error Rate Training for Statistical Machine Translation , 2008, EMNLP.