Phrase Linguistic Classification and Generalization for Improving Statistical Machine Translation

In this paper a method to incorporate linguistic information regarding single-word and compound verbs is proposed, as a first step towards an SMT model based on linguistically-classified phrases. By substituting these verb structures by the base form of the head verb, we achieve a better statistical word alignment performance, and are able to better estimate the translation model and generalize to unseen verb forms during translation. Preliminary experiments for the English - Spanish language pair are performed, and future research lines are detailed.

[1]  Hermann Ney,et al.  Statistical Machine Translation with Scarce Resources Using Morpho-syntactic Information , 2004, CL.

[2]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[3]  Michael Gamon,et al.  Normalizing German and English inflectional morphology to improve statistical word alignment , 2004, AMTA.

[4]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[5]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[6]  Young-Suk Lee,et al.  Morphological Analysis for Statistical Machine Translation , 2004, NAACL.

[7]  José B. Mariño,et al.  Finite-state-based and phrase-based statistical machine translation , 2004, INTERSPEECH.

[8]  Hermann Ney,et al.  Using POS information for statistical machine translation into morphologically rich languages , 2003, Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - EACL '03.

[9]  Xavier Carreras,et al.  FreeLing: An Open-Source Suite of Language Analyzers , 2004, LREC.

[10]  Hermann Ney,et al.  Improvements in Phrase-Based Statistical Machine Translation , 2004, NAACL.

[11]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[12]  Hermann Ney,et al.  Using POS information for statistical machine translation into morphologically rich languages , 2003, Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - EACL '03.