Improving Phrase-Based Statistical Machine Translation with Morpho-Syntactic Analysis and Transformation

This paper presents our study of exploiting morpho-syntactic information for phrase-based statistical machine translation (SMT). For morphological transformation, we use hand-crafted transformational rules. For syntactic transformation, we propose a transformational model based on Bayes’ formula. The model is trained using a bilingual corpus and a broad coverage parser of the source language. The morphological and syntactic transformations are used in the preprocessing phase of a SMT system. This preprocessing method is applicable to language pairs in which the target language is poor in resources. We applied the proposed method to translation from English to Vietnamese. Our experiments showed a BLEU-score improvement of more than 3.28% in comparison with Pharaoh, a state-of-the-art phrase-based SMT system.

[1]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[2]  PietraVincent J. Della,et al.  The mathematics of statistical machine translation , 1993 .

[3]  Hermann Ney,et al.  Improving SMT quality with morpho-syntactic analysis , 2000, COLING.

[4]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[5]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[6]  Daniel Marcu,et al.  A Phrase-Based,Joint Probability Model for Statistical Machine Translation , 2002, EMNLP.

[7]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[8]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[9]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[10]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[11]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[12]  Daniel M. Bikel,et al.  Intricacies of Collins’ Parsing Model , 2004, CL.

[13]  Fei Xia,et al.  Improving a Statistical MT System with Automatically Learned Rewrite Patterns , 2004, COLING.

[14]  Philipp Koehn,et al.  Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models , 2004, AMTA.

[15]  Young-Suk Lee,et al.  Morphological Analysis for Statistical Machine Translation , 2004, NAACL.

[16]  Anoop Sarkar,et al.  Discriminative Reranking for Machine Translation , 2004, NAACL.

[17]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[18]  Alexander M. Fraser,et al.  A Smorgasbord of Features for Statistical Machine Translation , 2004, NAACL.

[19]  I. Dan Melamed,et al.  Statistical Machine Translation by Parsing , 2004, ACL.

[20]  Sharon Goldwater,et al.  Improving Statistical MT through Morphological Analysis , 2005, HLT.

[21]  Philipp Koehn,et al.  Clause Restructuring for Statistical Machine Translation , 2005, ACL.