Domain Adaptation for SMT Using Sentence Weight

We describe a sentence-level domain adaptation translation system, which trained with the sentence-weight model. Our system can take advantage of the domain information in each sentence rather than in the corpus. It is a fine-grained method for domain adaptation. By adding weights which reflect the preference of target domain to the sentences in the training set, we can improve the domain adaptation ability of a translation system. We set up the sentence-weight model depending on the similarity between sentences in the training set and the target domain text. In our method, the similarity is measured by the word frequency distribution. Our experiments on a large-scale Chinese-to-English translation task in news domain validate the effectiveness of our sentence-weight-based adaptation approach, with gains of up to 0.75 BLEU over a non-adapted baseline system.

[1]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[2]  Qun Liu,et al.  Improving Statistical Machine Translation Performance by Training Data Selection and Optimization , 2007, EMNLP-CoNLL.

[3]  Spyridon Matsoukas,et al.  Discriminative Corpus Weight Estimation for Machine Translation , 2009, EMNLP.

[4]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[5]  Josef van Genabith,et al.  Quality Estimation-guided Data Selection for Domain Adaptation of SMT , 2013, MTSUMMIT.

[6]  Eiichiro Sumita,et al.  Dynamic Model Interpolation for Statistical Machine Translation , 2008, WMT@ACL.

[7]  D. A. Field Laplacian smoothing and Delaunay triangulations , 1988 .

[8]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[9]  Roland Kuhn,et al.  Discriminative Instance Weighting for Domain Adaptation in Statistical Machine Translation , 2010, EMNLP.

[10]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[11]  Jianfeng Gao,et al.  Domain Adaptation via Pseudo In-Domain Data Selection , 2011, EMNLP.

[12]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[13]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[14]  Rico Sennrich,et al.  Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics , 2012 .

[15]  William D. Lewis,et al.  Intelligent Selection of Language Model Training Data , 2010, ACL.

[16]  Roland Kuhn,et al.  Mixture-Model Adaptation for SMT , 2007, WMT@ACL.

[17]  Philipp Koehn,et al.  Experiments in Domain Adaptation for Statistical Machine Translation , 2007, WMT@ACL.

[18]  Stephan Vogel,et al.  Language Model Adaptation for Statistical Machine Translation via Structured Query Models , 2004, COLING.