论文信息 - Robust Estimation of Feature Weights in Statistical Machine Translation

Robust Estimation of Feature Weights in Statistical Machine Translation

Weights of the various components in a standard Statistical Machine Translation model are usually estimated via Minimum Error Rate Training. With this, one finds their optimum value on a development set with the expectation that these optimal weights generalise well to other test sets. However, this is not always the case when domains differ. This work uses a perceptron algorithm to learn more robust weights to be used on out-of-domain corpora without the need for specialised data. For an Arabic-to-English translation system, the generalisation of weights represents an improvement of more than 2 points of BLEU with respect to the MERT baseline using the same information.

Lluís Màrquez i Villodre | Cristina España-Bonet | C. España-Bonet

[1] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[2] Chris Callison-Burch,et al. Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Lattice Decoding , 2006 .

[3] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[4] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[5] Philipp Koehn,et al. Online learning methods for discriminative training of phrase based statistical machine translation , 2007, MTSUMMIT.

[6] Chin-Yew Lin,et al. Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics , 2004, ACL.

[7] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[8] George R. Doddington,et al. Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[9] Enrique Amigó,et al. IQmt: A Framework for Automatic Machine Translation Evaluation , 2006, LREC.

[10] Hermann Ney,et al. Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[11] Philip Resnik,et al. Online Large-Margin Training of Syntactic and Structural Translation Features , 2008, EMNLP.