论文信息 - Improving Translation Fluency with Search-Based Decoding and a Monolingual Statistical Machine Translation Model for Automatic Post-Editing

Improving Translation Fluency with Search-Based Decoding and a Monolingual Statistical Machine Translation Model for Automatic Post-Editing

The BLEU scores and translation fluency for the current state-of-the-art SMT systems based on IBM models are still too low for publication purposes. The major issue is that stochastically generated sentences hypotheses, produced through a stack decoding process, may not strictly follow the natural target language grammar, since the decoding process is directed by a highly simplified translation model and n-gram language model, and a large number of noisy phrase pairs may introduce significant search errors. This paper proposes a statistical post-editing (SPE) model, based on a special monolingual SMT paradigm, to “translate”disfluent sentences into fluent sentences. However, instead of conducting a stack decoding process, the sentence hypotheses are searched from fluent target sentences in a large target language corpus or on the Web to ensure fluency. Phrase-based local editing, if necessary, is then applied to correct weakest phrase alignments between the disfluent and searched hypotheses using fluent target language phrases; such phrases are segmented from a large target language corpus with a global optimization criterion to maximize the likelihood of the training sentences, instead of using noisy phrases combined from bilingually wordaligned pairs. With such search-based decoding, the absolute BLEU scores are much higher than automatic post editing systems that conduct a classical SMT decoding process. We are also able to fully correct a significant number of disfluent sentences into completely fluent versions. The BLEU scores are significantly improved. The evaluation shows that on average 46% of translation errors can be fully recovered, and the BLEU score can be improved by about 26%.

Jing-Shin Chang | Sheng-Sian Lin | Jing-Shin Chang | Sheng-Sian Lin

[1] Hermann Ney,et al. The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[2] Cyril Goutte,et al. Domain adaptation of MT systems through automatic post-editing , 2007, MTSUMMIT.

[3] John Lee,et al. Automatic Article Restoration , 2004, NAACL.

[4] Hermann Ney,et al. Improved Alignment Models for Statistical Machine Translation , 1999, EMNLP.

[5] Keh-Yih Su,et al. Statistical Models for Word Segmentation And Unknown Word Resolution , 1992, ROCLING.

[6] David Chiang,et al. A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[7] Michel Simard,et al. Statistical Phrase-Based Post-Editing , 2007, NAACL.

[8] Timothy R. Anderson,et al. The MIT-LL/AFRL IWSLT-2006 MT system , 2006, IWSLT.

[9] Hermann Ney,et al. Improved Statistical Alignment Models , 2000, ACL.

[10] Jaime G. Carbonell,et al. Automating Post-Editing To Improve MT Systems , 2006 .

[11] Roland Kuhn,et al. Rule-Based Translation with Statistical Phrase-Based Post-Editing , 2007, WMT@ACL.

[12] Yu Zhou,et al. Bilingual chunk alignment in statistical machine translation , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[13] Jing-Shin Chang,et al. A Chinese-to-Chinese statistical machine translation model for mining synonymous simplified-traditional Chinese terms , 2007, MTSUMMIT.

[14] Philipp Koehn,et al. Statistical Post-Editing on SYSTRAN‘s Rule-Based Translation System , 2007, WMT@ACL.

[15] John Cocke,et al. A Statistical Approach to Machine Translation , 1990, CL.

[16] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[17] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[18] Kevin Knight,et al. Automated Postediting of Documents , 1994, AAAI.