论文信息 - Length-Incremental Phrase Training for SMT - 字舞流文

Length-Incremental Phrase Training for SMT

We present an iterative technique to generate phrase tables for SMT, which is based on force-aligning the training data with a modified translation decoder. Different from previous work, we completely avoid the use of a word alignment or phrase extraction heuristics, moving towards a more principled phrase generation and probability estimation. During training, we allow the decoder to generate new phrases on-the-fly and increment the maximum phrase length in each iteration. Experiments are carried out on the IWSLT 2011 Arabic-English task, where we are able to reach moderate improvements on a state-of-the-art baseline with our training method. The resulting phrase table shows only a small overlap with the heuristically extracted one, which demonstrates the restrictiveness of limiting phrase selection by a word alignment or heuristics. By interpolating the heuristic and the trained phrase table, we can improve over the baseline by 0.5% BLEU and 0.5% TER.

Hermann Ney | Joern Wuebker | H. Ney | Joern Wuebker

[1] Dekai Wu,et al. Principled Induction of Phrasal Bilexica , 2011, EAMT.

[2] Hermann Ney,et al. The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[3] Mei-Yuh Hwang,et al. Leave-One-Out Phrase Model Training for Large-Scale Deployment , 2012, WMT@NAACL-HLT.

[4] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[5] Chris Quirk,et al. An Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation , 2007, WMT@ACL.

[6] Chris Dyer,et al. A Gibbs Sampler for Phrasal Synchronous Grammar Induction , 2009, ACL.

[7] Richard Zens,et al. Phrase based statistical machine translation: models, search, raining , 2008 .

[8] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[9] John DeNero,et al. Why Generative Phrase Models Underperform Surface Heuristics , 2006, WMT@HLT-NAACL.

[10] William D. Lewis,et al. Intelligent Selection of Language Model Training Data , 2010, ACL.

[11] Ralph Weischedel,et al. A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[12] Markus Freitag,et al. Jane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation , 2012, COLING.

[13] Philipp Koehn,et al. Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[14] Li Deng,et al. Maximum Expected BLEU Training of Phrase and Lexicon Translation Models , 2012, ACL.

[15] John DeNero,et al. Sampling Alignment Structure under a Bayesian Translation Model , 2008, EMNLP.

[16] Ming Zhou,et al. Forced Derivation Tree based Model Training to Statistical Machine Translation , 2012, EMNLP.

[17] Daniel Marcu,et al. Statistical Phrase-Based Translation , 2003, NAACL.

[18] Hermann Ney,et al. Training Phrase Translation Models with Leaving-One-Out , 2010, ACL.

[19] Wolfgang Macherey,et al. Lattice-based Minimum Error Rate Training for Statistical Machine Translation , 2008, EMNLP.

[20] Taro Watanabe,et al. An Unsupervised Model for Joint Phrase Alignment and Extraction , 2011, ACL.

[21] Daniel Marcu,et al. A Phrase-Based,Joint Probability Model for Statistical Machine Translation , 2002, EMNLP.

[22] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[23] Philipp Koehn,et al. Constraining the Phrase-Based, Joint Probability Statistical Translation Model , 2006, WMT@HLT-NAACL.

[24] Franz Josef Och,et al. Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[25] Matthew G. Snover,et al. A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[26] Ben Taskar,et al. An End-to-End Discriminative Approach to Machine Translation , 2006, ACL.

[27] Jianfeng Gao,et al. Domain Adaptation via Pseudo In-Domain Data Selection , 2011, EMNLP.

[28] Khalil Sima'an,et al. Phrase Translation Probabilities with ITG Priors and Smoothing as Learning Objective , 2008, EMNLP.

[29] Hermann Ney,et al. Phrase-Based Statistical Machine Translation , 2002, KI.