NICT-2 Translation System for WAT2016: Applying Domain Adaptation to Phrase-based Statistical Machine Translation

This paper describes the NICT-2 translation system for the 3rd Workshop on Asian Translation. The proposed system employs a domain adaptation method based on feature augmentation. We regarded the Japan Patent Office Corpus as a mixture of four domain corpora and improved the translation quality of each domain. In addition, we incorporated language models constructed from Google n-grams as external knowledge. Our domain adaptation method can naturally incorporate such external knowledge that contributes to translation quality.

[1]  Graham Neubig,et al.  Overview of the 3rd Workshop on Asian Translation , 2015, WAT@COLING.

[2]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[3]  George F. Foster,et al.  Batch Tuning Strategies for Statistical Machine Translation , 2012, NAACL.

[4]  Eiichiro Sumita,et al.  Overview of the 1st Workshop on Asian Translation , 2014, WAT.

[5]  Tetsuji Nakagawa Efficient Top-Down BTG Parsing for Machine Translation Preordering , 2015, ACL.

[6]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[7]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[8]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[9]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[10]  Philipp Koehn,et al.  Scalable Modified Kneser-Ney Language Model Estimation , 2013, ACL.

[11]  Alon Lavie,et al.  Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability , 2011, ACL.

[12]  Eiichiro Sumita,et al.  Multi-domain Adaptation for Statistical Machine Translation Based on Feature Augmentation , 2016, AMTA.

[13]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[14]  Masao Utiyama,et al.  Preordering using a Target-Language Parser via Cross-Language Syntactic Projection for Statistical Machine Translation , 2015, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[15]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[16]  Toshiaki Nakazawa,et al.  ASPEC: Asian Scientific Paper Excerpt Corpus , 2016, LREC.