Hybrid Arabic-French machine translation using syntactic re-ordering and morphological pre-processing

Hybrid Arabic-to-French SMT using rule-based pre-processing and language analysis.Morphologically reduced rules that reduce the morphology of Arabic.Swapping rules for a structural matching on pronouns and verbs.A gain in terms of BLEU score after applying some of these rules.A learning curve showing the findings under scarce- or large-resources conditions. Arabic is a highly inflected language and a morpho-syntactically complex language with many differences compared to several languages that are heavily studied. It may thus require good pre-processing as it presents significant challenges for Natural Language Processing (NLP), specifically for Machine Translation (MT). This paper aims to examine how Statistical Machine Translation (SMT) can be improved using rule-based pre-processing and language analysis. We describe a hybrid translation approach coupling an Arabic-French statistical machine translation system using the Moses decoder with additional morphological rules that reduce the morphology of the source language (Arabic) to a level that makes it closer to that of the target language (French). Moreover, we introduce additional swapping rules for a structural matching between the source language and the target language. Two structural changes involving the positions of the pronouns and verbs in both the source and target languages have been attempted. The results show an improvement in the quality of translation and a gain in terms of BLEU score after introducing a pre-processing scheme for Arabic and applying these rules based on morphological variations and verb re-ordering (VS into SV constructions) in the source language (Arabic) according to their positions in the target language (French). Furthermore, a learning curve shows the improvement in terms on BLEU score under scarce- and large-resources conditions. The proposed approach is completed without increasing the amount of training data or radically changing the algorithms that can affect the translation or training engines.

[1]  Nizar Habash,et al.  Using Shallow Syntax Information to Improve Word Alignment and Reordering for SMT , 2008, WMT@ACL.

[2]  Nizar Habash,et al.  Improved Arabic-to-English statistical machine translation by reordering post-verbal subjects for word alignment , 2010, Machine Translation.

[3]  Hermann Ney,et al.  Creating a Large-Scale Arabic to French Statistical MachineTranslation System , 2006, LREC.

[4]  Alexander M. Fraser,et al.  Modeling Inflection and Word-Formation in SMT , 2012, EACL.

[5]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[6]  Nizar Habash,et al.  Arabic preprocessing for Statistical Machine Translation , 2012 .

[7]  Holger Schwenk,et al.  Translation Model Adaptation for an Arabic/French News Translation System by Lightly- Supervised Training , 2009, MTSUMMIT.

[8]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[9]  Nizar Habash,et al.  Reordering Matrix Post-verbal Subjects for Arabic-to-English SMT , 2010, JEPTALNRECITAL.

[10]  Fatiha Sadat,et al.  Pre-processing and Language Analysis for Arabic to French Statistical Machine Translation (Traduction automatique statistique pour l'arabe-français améliorée par le prétraitement et l'analyse de la langue) [in French] , 2013, TALN.

[11]  Ismail Hmeidi,et al.  Design and implementation of automatic indexing for information retrieval with Arabic documents , 1997 .

[12]  Nizar Habash,et al.  Combination of Arabic Preprocessing Schemes for Statistical Machine Translation , 2006, ACL.

[13]  Hermann Ney,et al.  Morpho-syntactic Arabic Preprocessing for Arabic to English Statistical Machine Translation , 2006, WMT@HLT-NAACL.

[14]  Sandra Kübler,et al.  Is Arabic Part of Speech Tagging Feasible Without Word Segmentation? , 2010, NAACL.

[15]  Nizar Habash,et al.  Permission is granted to quote short excerpts and to reproduce figures and tables from this report, provided that the source of such material is fully acknowledged. Arabic Preprocessing Schemes for Statistical Machine Translation , 2006 .

[16]  Fatiha Sadat,et al.  Towards a Hybrid Rule-based and Statistical Arabic-French Machine Translation System , 2013, RANLP.

[17]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[18]  Daniel Jurafsky,et al.  Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunks , 2004, NAACL.

[19]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[20]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[21]  Nizar Habash,et al.  Orthographic and morphological processing for English–Arabic statistical machine translation , 2011, Machine Translation.

[22]  Nizar Habash Syntactic preprocessing for statistical machine translation , 2007, MTSUMMIT.

[23]  Harold L. Somers,et al.  An introduction to machine translation , 1992 .

[24]  Christopher D. Manning,et al.  NP Subject Detection in Verb-initial Arabic Clauses , 2009, MTSUMMIT.

[25]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[26]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[27]  Sharon Goldwater,et al.  Improving Statistical MT through Morphological Analysis , 2005, HLT.

[28]  Chen Yu,et al.  Machine Learning for Hybrid Machine Translation , 2012, WMT@NAACL-HLT.

[29]  Young-Suk Lee,et al.  Morphological Analysis for Statistical Machine Translation , 2004, NAACL.

[30]  Andreas Zollmann,et al.  Syntax Augmented Machine Translation via Chart Parsing , 2006, WMT@HLT-NAACL.

[31]  Petr Pajas,et al.  Prague Arabic Dependency Treebank 1.0 , 2009 .