Selective Combination of Pivot and Direct Statistical Machine Translation Models

In this paper, we propose a selective combination approach of pivot and direct statistical machine translation (SMT) models to improve translation quality. We work with Persian-Arabic SMT as a case study. We show positive results (from 0.4 to 3.1 BLEU on different direct training corpus sizes) in addition to a large reduction of pivot translation model size.

[1]  Kemal Oflazer,et al.  Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish , 2010, ACL.

[2]  Daniel Marcu,et al.  Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[3]  Marcello Federico,et al.  Phrase-based statistical machine translation with pivot languages. , 2008, IWSLT.

[4]  Andreas Kathol,et al.  Strategies for building a Farsi-English SMT system from limited resources , 2008, INTERSPEECH.

[5]  Nizar Habash,et al.  Techniques for Arabic morphological detokenization and orthographic denormalization , 2010 .

[6]  James R. Glass,et al.  Segmentation for English-to-Arabic Statistical Machine Translation , 2008, ACL.

[7]  Günter Neumann,et al.  Arabic Computational Morphology: Knowledge-based and Empirical Methods , 2007 .

[8]  Hitoshi Isahara,et al.  A Comparison of Pivot Methods for Phrase-Based Statistical Machine Translation , 2007, NAACL.

[9]  NeyHermann,et al.  A systematic comparison of various statistical alignment models , 2003 .

[10]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[11]  Nizar Habash,et al.  MADA + TOKAN : A Toolkit for Arabic Tokenization , Diacritization , Morphological Disambiguation , POS Tagging , Stemming and Lemmatization , 2009 .

[12]  Fariborz Mahmoudi,et al.  Evaluation of Perstem: A Simple and Efficient Stemming Algorithm for Persian , 2009, CLEF.

[13]  Nizar Habash,et al.  Introduction to Arabic Natural Language Processing , 2010, Introduction to Arabic Natural Language Processing.

[14]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[15]  Philipp Koehn,et al.  462 Machine Translation Systems for Europe , 2009, MTSUMMIT.

[16]  Shankar Kumar,et al.  Local Phrase Reordering Models for Statistical Machine Translation , 2005, HLT.

[17]  Ruhi Sarikaya,et al.  Joint Morphological-Lexical Language Modeling for Machine Translation , 2007, HLT-NAACL.

[18]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[19]  José B. Mariño,et al.  The TALP&I2r SMT systems for IWSLT 2008 , 2008, IWSLT.

[20]  Nizar Habash,et al.  Orthographic and morphological processing for English–Arabic statistical machine translation , 2011, Machine Translation.

[21]  Alexander H. Waibel,et al.  Language Model Adaptation for Statistical Machine Translation Based on Information Retrieval , 2004, LREC.

[22]  Nizar Habash,et al.  Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop , 2005, ACL.

[23]  Nizar Habash,et al.  Language Independent Connectivity Strength Features for Phrase Pivot Statistical Machine Translation , 2013, ACL 2013.

[24]  Evgeny Matusov,et al.  Improving Reordering in Statistical Machine Translation from Farsi , 2010, AMTA.

[25]  Nizar Habash,et al.  On Arabic Transliteration , 2007 .

[26]  Hua Wu,et al.  Revisiting Pivot Language Approach for Machine Translation , 2009, ACL.

[27]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[28]  M. Maamouri,et al.  The Penn Arabic Treebank: Building a Large-Scale Annotated Arabic Corpus , 2004 .

[29]  Jun Hu,et al.  Improving Arabic-Chinese Statistical Machine Translation using English as Pivot Language , 2009, WMT@EACL.

[30]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[31]  Nizar Habash,et al.  Machine Translation between Hebrew and Arabic: Needs, Challenges and Preliminary Solutions , 2010, AMTA.

[32]  Nizar Habash,et al.  Permission is granted to quote short excerpts and to reproduce figures and tables from this report, provided that the source of such material is fully acknowledged. Arabic Preprocessing Schemes for Statistical Machine Translation , 2006 .

[33]  Alex Waibel,et al.  Adaptation of the translation model for statistical machine translation based on information retrieval , 2005, EAMT.

[34]  Mirella Lapata,et al.  Machine Translation by Triangulation: Making Effective Use of Multi-Parallel Corpora , 2007, ACL.

[35]  Abdolhossein Sarrafzadeh,et al.  GRAFIX: Automated Rule-Based Post Editing System to Improve English-Persian SMT Output , 2012, COLING.

[36]  Mohammad Sadegh Rasooli,et al.  Development of a Persian Syntactic Dependency Treebank , 2013, NAACL 2013.

[37]  Rohit Gupta,et al.  Learning Improved Reordering Models for Urdu, Farsi and Italian using SMT , 2012, SMT@COLING.

[38]  Jan Hajic,et al.  Machine Translation of Very Close Languages , 2000, ANLP.

[39]  Philipp Koehn,et al.  Experiments in Domain Adaptation for Statistical Machine Translation , 2007, WMT@ACL.

[40]  Nizar Habash,et al.  Syntactic Reordering for English-Arabic Phrase-Based Machine Translation , 2009, SEMITIC@EACL.

[41]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.