Statistical Machine Translation with a Factorized Grammar

In modern machine translation practice, a statistical phrasal or hierarchical translation system usually relies on a huge set of translation rules extracted from bi-lingual training data. This approach not only results in space and efficiency issues, but also suffers from the sparse data problem. In this paper, we propose to use factorized grammars, an idea widely accepted in the field of linguistic grammar construction, to generalize translation rules, so as to solve these two problems. We designed a method to take advantage of the XTAG English Grammar to facilitate the extraction of factorized rules. We experimented on various setups of low-resource language translation, and showed consistent significant improvement in BLEU over state-of-the-art string-to-dependency baseline systems with 200K words of bi-lingual training data.

[1]  Jinxi Xu,et al.  A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model , 2008, ACL.

[2]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[3]  Xavier Carreras,et al.  Non-Projective Parsing for Statistical Machine Translation , 2009, EMNLP.

[4]  XTAG Research Group,et al.  A Lexicalized Tree Adjoining Grammar for English , 1998, ArXiv.

[5]  Neville Ryant,et al.  A Large-scale Classication of English Verbs , 2006 .

[6]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[7]  Anne Abeillé,et al.  A Lexicalized Tree Adjoining Grammar for English , 1990 .

[8]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[9]  Daniel Marcu,et al.  SPMT: Statistical Machine Translation with Syntactified Target Language Phrases , 2006, EMNLP.

[10]  Philipp Koehn,et al.  Factored Translation Models , 2007, EMNLP.

[11]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[12]  Aravind K. Joshi,et al.  Tree-Adjoining Grammars , 1997, Handbook of Formal Languages.

[13]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[14]  Kevin Knight,et al.  Synchronous Tree Adjoining Machine Translation , 2009, EMNLP.

[15]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[16]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[17]  Spyridon Matsoukas,et al.  Effective Use of Linguistic and Contextual Information for Statistical Machine Translation , 2009, EMNLP.

[18]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[19]  Jacob Devlin,et al.  Lexical Features for Statistical Machine Translation , 2009 .