Phrase Clustering for Smoothing TM Probabilities - or, How to Extract Paraphrases from Phrase Tables

This paper describes how to cluster together the phrases of a phrase-based statistical machine translation (SMT) system, using information in the phrase table itself. The clustering is symmetric and recursive: it is applied both to source-language and target-language phrases, and the clustering in one language helps determine the clustering in the other. The phrase clusters have many possible uses. This paper looks at one of these uses: smoothing the conditional translation model (TM) probabilities employed by the SMT system. We incorporated phrase-cluster-derived probability estimates into a baseline loglinear feature combination that included relative frequency and lexically-weighted conditional probability estimates. In Chinese-English (C-E) and French-English (F-E) learning curve experiments, we obtained a gain over the baseline in 29 of 30 tests, with a maximum gain of 0.55 BLEU points (though most gains were fairly small). The largest gains came with medium (200--400K sentence pairs) rather than with small (less than 100K sentence pairs) amounts of training data, contrary to what one would expect from the paraphrasing literature. We have only begun to explore the original smoothing approach described here.

[1]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[2]  Roland Kuhn,et al.  Phrasetable Smoothing for Statistical Machine Translation , 2006, EMNLP.

[3]  Chris Callison-Burch,et al.  Paraphrasing with Bilingual Parallel Corpora , 2005, ACL.

[4]  José A. R. Fonollosa,et al.  Smooth Bilingual N-Gram Translation , 2007, EMNLP.

[5]  Haifeng Wang,et al.  Pivot Approach for Extracting Paraphrase Patterns from Bilingual Corpora , 2008, ACL.

[6]  Hermann Ney,et al.  Improvements in Phrase-Based Statistical Machine Translation , 2004, NAACL.

[7]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[8]  David Chiang,et al.  Forest Rescoring: Faster Decoding with Integrated Language Models , 2007, ACL.

[9]  Wolfgang Macherey,et al.  Lattice-based Minimum Error Rate Training for Statistical Machine Translation , 2008, EMNLP.

[10]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[11]  Philipp Koehn,et al.  Improved Statistical Machine Translation Using Paraphrases , 2006, NAACL.

[12]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[13]  Chris Callison-Burch,et al.  Syntactic Constraints on Paraphrases Extracted from Parallel Corpora , 2008, EMNLP.

[14]  Chris Callison-Burch,et al.  Improved Statistical Machine Translation Using Monolingually-Derived Paraphrases , 2009, EMNLP.