System Description of NiCT-ATR SMT for NTCIR-7

In this paper we propose a method to improve SMT based patent translatioin. This method first employs International Patent Classification to build class based models. Then, multiple models are interpolated by weighting method employing source side language models. We carried out experiments using data from the patent translation task of NTCIR-7 workshop. According to the experimental results, the proposed method improved the most of automatic scores, which were NIST, WER and PER. Experimental results also shows BLUE score degradation in the proposed method. However, statistical tests by bootstrapping does not show significance for the degradation.

[1]  Masao Utiyama,et al.  Overview of the Patent Translation Task at the NTCIR-7 Workshop , 2008, NTCIR.

[2]  Wolfgang Macherey,et al.  Lattice-based Minimum Error Rate Training for Statistical Machine Translation , 2008, EMNLP.

[3]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[4]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[5]  Makoto Nagao,et al.  A framework of a mechanical translation between Japanese and English by analogy principle , 1984 .

[6]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[7]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[8]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[9]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[10]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[11]  M. Utiyama,et al.  A Japanese-English patent parallel corpus , 2007, MTSUMMIT.

[12]  Ying Zhang,et al.  Measuring confidence intervals for the machine translation evaluation metrics , 2004, TMI.

[13]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[14]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.