Integration of POStag-based Source Reordering into SMT Decoding by an Extended Search Graph

This paper presents a reordering framework for statistical machine translation (SMT) where source-side reorderings are integrated into SMT decoding, allowing for a highly constrained reordered search graph. The monotone search is extended by means of a set of reordering patterns (linguistically motivated rewrite patterns). Patterns are automatically learnt in training from word-to-word alignments and source-side Part-Of-Speech (POS) tags. Traversing the extended search graph, the decoder evaluates every hypothesis making use of a group of widely used SMT models and helped by an additional Ngram language model of sourceside POS tags. Experiments are reported on the Euparl task (Spanish-to-English and English-toSpanish). Results are presented regarding translation accuracy (using human and automatic evaluations) and computational efcienc y, showing signicant improvements in translation quality for both translation directions at a very low computational cost.

[1]  Philipp Koehn,et al.  Explorer Edinburgh System Description for the 2005 IWSLT Speech Translation Evaluation , 2005 .

[2]  José B. Mariño,et al.  An n-gram-based statistical machine translation decoder , 2005, INTERSPEECH.

[3]  Xavier Carreras,et al.  FreeLing: An Open-Source Suite of Language Analyzers , 2004, LREC.

[4]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[5]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[6]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[7]  Philipp Koehn,et al.  Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[8]  José B. Mariño,et al.  Bilingual N-gram Statistical Machine Translation , 2005 .

[9]  Hermann Ney,et al.  Novel Reordering Approaches in Phrase-Based Statistical Machine Translation , 2005, ParallelText@ACL.

[10]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[11]  Shankar Kumar,et al.  Local Phrase Reordering Models for Statistical Machine Translation , 2005, HLT.

[12]  Hermann Ney,et al.  Efficient statistical machine translation with constrained reordering , 2005, EAMT.

[13]  Kevin Knight,et al.  Decoding Complexity in Word-Replacement Translation Models , 1999, Comput. Linguistics.

[14]  Dekai Wu,et al.  A Polynomial-Time Algorithm for Statistical Machine Translation , 1996, ACL.

[15]  Fei Xia,et al.  Improving a Statistical MT System with Automatically Learned Rewrite Patterns , 2004, COLING.

[16]  Marc Dymetman,et al.  Translating with Non-contiguous Phrases , 2005, HLT.

[17]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..