Supertags as source language context in hierarchical phrase-based SMT

Statistical machine translation (SMT) models have recently begun to include source context modeling, under the assumption that the proper lexical choice of the translation for an ambiguous word can be determined from the context in which it appears. Various types of lexical and syntactic features have been explored as effective source context to improve phrase selection in SMT. In the present work, we introduce lexico-syntactic descriptions in the form of supertags as source-side context features in the state-of-the-art hierarchical phrase-based SMT (HPB) model. These features enable us to exploit source similarity in addition to target similarity, as modelled by the language model. In our experiments two kinds of supertags are employed: those from lexicalized tree-adjoining grammar (LTAG) and combinatory categorial grammar (CCG). We use a memory-based classification framework that enables the efficient estimation of these features. Despite the differences between the two supertagging approaches, they give similar improvements. We evaluate the performance of our approach on an English-to-Dutch translation task, and report statistically significant improvements of 4.48% and 6.3% BLEU scores in translation quality when adding CCG and LTAG supertags, respectively, as context-informed features.

[1]  Hwee Tou Ng,et al.  Word Sense Disambiguation Improves Statistical Machine Translation , 2007, ACL.

[2]  Walter Daelemans,et al.  IGTree: Using Trees for Compression and Classification in Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[3]  Philippe Langlais,et al.  Explorations in using grammatical dependencies for contextual phrase translation disambiguation , 2008, EAMT.

[4]  Kevin Knight,et al.  11,001 New Features for Statistical Machine Translation , 2009, NAACL.

[5]  Philippe Langlais Prediction Of Words In Statistical Machine Translation Using A Multilayer Perceptron , 2009 .

[6]  Michael Collins,et al.  A Discriminative Model for Tree-to-Tree Translation , 2006, EMNLP.

[7]  Hermann Ney,et al.  Triplet Lexicon Models for Statistical Machine Translation , 2008, EMNLP.

[8]  Walter Daelemans,et al.  Memory-Based Language Processing , 2009, Studies in natural language processing.

[9]  Qun Liu,et al.  Improving Statistical Machine Translation using Lexicalized Rule Selection , 2008, COLING.

[10]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[11]  Roland Kuhn,et al.  Phrasetable Smoothing for Statistical Machine Translation , 2006, EMNLP.

[12]  Srinivas Bangalore,et al.  Statistical Machine Translation through Global Lexical Selection and Sentence Reconstruction , 2007, ACL.

[13]  Marine Carpuat,et al.  Improving Statistical Machine Translation Using Word Sense Disambiguation , 2007, EMNLP.

[14]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[15]  Hermann Ney,et al.  Refined Lexicon Models for Statistical Machine Translation using a Maximum Entropy Approach , 2001, ACL.

[16]  Aravind K. Joshi,et al.  Tree-adjoining grammars and lexicalized grammars , 1992, Tree Automata and Languages.

[17]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[18]  Martin Kay,et al.  Syntactic Process , 1979, ACL.

[19]  Lluís Màrquez i Villodre,et al.  Context-aware Discriminative Phrase Selection for Statistical Machine Translation , 2007, WMT@ACL.

[20]  Spyridon Matsoukas,et al.  Effective Use of Linguistic and Contextual Information for Statistical Machine Translation , 2009, EMNLP.

[21]  Yanjun Ma,et al.  Using Supertags as Source Language Context in SMT , 2009, EAMT.

[22]  Hermann Ney,et al.  Extending Statistical Machine Translation with Discriminative and Trigger-Based Lexicon Models , 2009, EMNLP.

[23]  Andy Way,et al.  Dependency Relations as Source Context in Phrase-Based SMT , 2009, PACLIC.

[24]  Lucia Specia,et al.  n-Best Reranking for the Efficient Integration of Word Sense Disambiguation and Statistical Machine Translation , 2008, CICLing.

[25]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[26]  Hermann Ney,et al.  Improvements in Phrase-Based Statistical Machine Translation , 2004, NAACL.

[27]  Daphne Koller,et al.  Word-Sense Disambiguation for Machine Translation , 2005, HLT.

[28]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[29]  Rada Mihalcea,et al.  Word Sense Disambiguation , 2015, Encyclopedia of Machine Learning.

[30]  P. Jana,et al.  MAXIMUM-ENTROPY APPROACH , 2003 .

[31]  Andy Way,et al.  Exploiting source similarity for SMT using context-informed features , 2007, TMI.

[32]  William J. Byrne,et al.  Context-Dependent Alignment Models for Statistical Machine Translation , 2009, NAACL.

[33]  Noah A. Smith,et al.  Feature-Rich Translation by Quasi-Synchronous Lattice Parsing , 2009, EMNLP.