Applying Morphology Generation Models to Machine Translation

We improve the quality of statistical machine translation (SMT) by applying models that predict word forms from their stems using extensive morphological and syntactic information from both the source and target languages. Our inflection generation models are trained independently of the SMT system. We investigate different ways of combining the inflection prediction component with the SMT system by training the base MT system on fully inflected forms or on word stems. We applied our inflection generation models in translating English into two morphologically complex languages, Russian and Arabic, and show that our model improves the quality of SMT over both phrasal and syntax-based SMT systems according to BLEU and human judgements.

[1]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[2]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[3]  Philipp Koehn,et al.  Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models , 2004, AMTA.

[4]  Young-Suk Lee,et al.  Morphological Analysis for Statistical Machine Translation , 2004, NAACL.

[5]  Hermann Ney,et al.  Statistical Machine Translation with Scarce Resources Using Morpho-syntactic Information , 2004, CL.

[6]  Sharon Goldwater,et al.  Improving Statistical MT through Morphological Analysis , 2005, HLT.

[7]  Chris Quirk,et al.  Dependency Treelet Translation: Syntactically Informed Phrasal SMT , 2005, ACL.

[8]  Kemal Oflazer,et al.  Initial Explorations in English to Turkish Statistical Machine Translation , 2006, WMT@HLT-NAACL.

[9]  Andrew Y. Ng,et al.  Solving the Problem of Cascading Errors: Approximate Bayesian Inference for Linguistic Annotation Pipelines , 2006, EMNLP.

[10]  Daniel Marcu,et al.  Capitalizing Machine Translation , 2006, NAACL.

[11]  Nizar Habash,et al.  Permission is granted to quote short excerpts and to reproduce figures and tables from this report, provided that the source of such material is fully acknowledged. Arabic Preprocessing Schemes for Statistical Machine Translation , 2006 .

[12]  Kristina Toutanova,et al.  Generating Complex Morphology for Machine Translation , 2007, ACL.

[13]  Xiaodong He Using Word-Dependent Transition Models in HMM-Based Word Alignment for Statistical Machine Translation , 2007, WMT@ACL.

[14]  Alexander M. Fraser,et al.  Squibs and Discussions: Measuring Word Alignment Quality for Statistical Machine Translation , 2007, CL.

[15]  Philipp Koehn,et al.  Factored Translation Models , 2007, EMNLP.

[16]  Kristina Toutanova,et al.  Generating Case Markers in Machine Translation , 2007, NAACL.

[17]  Wolfgang Macherey,et al.  Lattice-based Minimum Error Rate Training for Statistical Machine Translation , 2008, EMNLP.