Statistical Machine Translation

We introduce a brief introduction to statistical machine translation for semitic languages along with an overview of machine translation approaches. We discuss the special consideration that should be taken into account when developing SMT systems for Semitic languages. We discuss segmentation techniques for Semitic SMT; and finally we introduce a detailed guide on how to build an SMT using freely available resources.

[1]  Michael Elhadad,et al.  An Unsupervised Morpheme-Based HMM for Hebrew Morphological Disambiguation , 2006, ACL.

[2]  Christopher D. Manning,et al.  A Simple and Effective Hierarchical Phrase Reordering Model , 2008, EMNLP.

[3]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[4]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[5]  Philipp Koehn,et al.  Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models , 2004, AMTA.

[6]  Miles Osborne,et al.  Randomised Language Modelling for Statistical Machine Translation , 2007, ACL.

[7]  Dekai Wu,et al.  MT model space: statistical versus compositional versus example-based machine translation , 2005, Machine Translation.

[8]  Khalil Sima'an,et al.  Smoothing a Lexicon-based POS Tagger for Arabic and Hebrew , 2007, SEMITIC@ACL.

[9]  Mauro Cettolo,et al.  IRSTLM: an open source toolkit for handling large scale language models , 2008, INTERSPEECH.

[10]  Nizar Habash,et al.  Orthographic and morphological processing for English–Arabic statistical machine translation , 2011, Machine Translation.

[11]  Alon Lavie,et al.  The impact of Arabic morphological segmentation on broad-coverage English-to-Arabic statistical machine translation , 2012, AMTA.

[12]  Nizar Habash,et al.  Machine Translation between Hebrew and Arabic: Needs, Challenges and Preliminary Solutions , 2010, AMTA.

[13]  Christoph Tillmann,et al.  A Unigram Orientation Model for Statistical Machine Translation , 2004, NAACL.

[14]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[15]  Saab Mansour Morphtagger: HMM-based Arabic segmentation for statistical machine translation , 2010, IWSLT.

[16]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.