Modeling Inflection and Word-Formation in SMT

The current state-of-the-art in statistical machine translation (SMT) suffers from issues of sparsity and inadequate modeling power when translating into morphologically rich languages. We model both inflection and word-formation for the task of translating into German. We translate from English words to an underspecified German representation and then use linear-chain CRFs to predict the fully specified German representation. We show that improved modeling of inflection and wordformation leads to improved SMT.

[1]  Mathias Creutz,et al.  Morphology-aware statistical machine translation based on morphs induced in an unsupervised manner , 2007, MTSUMMIT.

[2]  Anoop Sarkar,et al.  Combining Morpheme-based Machine Translation with Post-processing Morpheme Prediction , 2011, ACL.

[3]  Ondrej Bojar,et al.  2010 Failures in English-Czech Phrase-Based MT , 2010, WMT@ACL.

[4]  Kemal Oflazer,et al.  Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish , 2010, ACL.

[5]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[6]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[7]  Alexander M. Fraser,et al.  Experiments in Morphosyntactic Processing for Translating to and from German , 2009, WMT@EACL.

[8]  Sara Stymne,et al.  German Compounds in Factored Statistical Machine Translation , 2008, GoTAL.

[9]  Kristina Toutanova,et al.  Applying Morphology Generation Models to Machine Translation , 2008, ACL.

[10]  Sara Stymne,et al.  Productive Generation of Compound Words in Statistical Machine Translation , 2011, WMT@EMNLP.

[11]  James R. Glass,et al.  Segmentation for English-to-Arabic Statistical Machine Translation , 2008, ACL.

[12]  François Yvon,et al.  Practical Very Large Scale CRFs , 2010, ACL.

[13]  Philipp Koehn,et al.  Enriching Morphologically Poor Languages for Statistical Machine Translation , 2008, ACL.

[14]  Hermann Ney,et al.  Statistical Machine Translation of German Compound Words , 2006, FinTAL.

[15]  Preslav Nakov,et al.  A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages , 2010, EMNLP.

[16]  José B. Mariño,et al.  On the impact of morphology in English to Spanish statistical MT , 2008, Speech Commun..

[17]  Ulrich Heid,et al.  SMOR: A German Computational Morphology Covering Derivation, Composition and Inflection , 2004, LREC.

[18]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[19]  Philipp Koehn,et al.  Factored Translation Models , 2007, EMNLP.

[20]  Philipp Koehn,et al.  Agreement Constraints for Statistical Machine Translation into German , 2011, WMT@EMNLP.

[21]  Ondrej Bojar,et al.  Failures in English-Czech Phrase-Based MT ∗ , 2010 .

[22]  Helmut Schmid Efficient Parsing of Highly Ambiguous Context-Free Grammars with Bit Vectors , 2004, COLING.

[23]  Philipp Koehn,et al.  Empirical Methods for Compound Splitting , 2003, EACL.

[24]  Alexander M. Fraser,et al.  How to Avoid Burning Ducks: Combining Linguistic Analysis and Corpus Statistics for German Compound Processing , 2010, WMT@ACL.