论文信息 - A Unigram Orientation Model for Statistical Machine Translation

A Unigram Orientation Model for Statistical Machine Translation

In this paper, we present a unigram segmentation model for statistical machine translation where the segmentation units are blocks: pairs of phrases without internal structure. The segmentation model uses a novel orientation component to handle swapping of neighbor blocks. During training, we collect block unigram counts with orientation: we count how often a block occurs to the left or to the right of some predecessor block. The orientation model is shown to improve translation performance over two models: 1) no block re-ordering is used, and 2) the block swapping is controlled only by a language model. We show experimental results on a standard Arabic-English translation task.

Christoph Tillmann | C. Tillmann | Christoph Tillmann

[1] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[2] Dekai Wu,et al. A Polynomial-Time Algorithm for Statistical Machine Translation , 1996, ACL.

[3] Hermann Ney,et al. Improved Alignment Models for Statistical Machine Translation , 1999, EMNLP.

[4] Fei Xia,et al. A Phrase-based Unigram Model for Statistical Machine Translation , 2003, HLT-NAACL.

[5] Daniel Marcu,et al. Statistical Phrase-Based Translation , 2003, NAACL.

[6] Hermann Ney,et al. A Comparative Study on Reordering Constraints in Statistical Machine Translation , 2003, ACL.

[7] Alexander H. Waibel,et al. Effective Phrase Translation Extraction from Alignment Models , 2003, ACL.