Using Statistical Machine Translation

We present an approach for generation of morphologically rich languages using statistical machine translation. Given a sequence of lemmas and any subset of morphological features, we produce the inflected word forms. Testing on Arabic, a morphologically rich language, our models can reach 92.1% accuracy starting only with lemmas, and 98.9% accuracy if all the gold features are provided.

[1]  Nizar Habash,et al.  Translate, Predict or Generate: Modeling Rich Morphology in Statistical Machine Translation , 2012, EAMT.

[2]  Chris Quirk,et al.  Monolingual Machine Translation for Paraphrase Generation , 2004, EMNLP.

[3]  Nizar Habash,et al.  Identifying Broken Plurals, Irregular Gender, and Rationality in Arabic Text , 2012, EACL.

[4]  Anoop Sarkar,et al.  Combining Morpheme-based Machine Translation with Post-processing Morpheme Prediction , 2011, ACL.

[5]  Nizar Habash,et al.  Introduction to Arabic Natural Language Processing , 2010, Introduction to Arabic Natural Language Processing.

[6]  Kristina Toutanova,et al.  Generating Complex Morphology for Machine Translation , 2007, ACL.

[7]  Nizar Habash,et al.  A Corpus for Modeling Morpho-Syntactic Agreement in Arabic: Gender, Number and Rationality , 2011, ACL.

[8]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[9]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[10]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[11]  Nizar Habash,et al.  Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop , 2005, ACL.

[12]  James R. Glass,et al.  Segmentation for English-to-Arabic Statistical Machine Translation , 2008, ACL.

[13]  Nizar Habash,et al.  Orthographic and morphological processing for English–Arabic statistical machine translation , 2011, Machine Translation.

[14]  Raymond J. Mooney,et al.  Generation by Inverting a Semantic Parser that Uses Statistical Machine Translation , 2007, NAACL.

[15]  Kristina Toutanova,et al.  Applying Morphology Generation Models to Machine Translation , 2008, ACL.