Evaluating the Impact of Using a Domain-specific Bilingual Lexicon on the Performance of a Hybrid Machine Translation Approach

This paper describes an Example-Based Machine Translation prototype and presents an evaluation of the impact of using a domainspecific vocabulary on its performance. This prototype is based on a hybrid approach which needs only monolingual texts in the target language and consists to combine translation candidates returned by a cross-language search engine with translation hypotheses provided by a finite-state transducer. The results of this combination are evaluated against a statistical language model of the target language in order to obtain the n-best translations. To measure the performance of this hybrid approach, we achieved several experiments using corpora on two domains from the European Parliament proceedings (Europarl) and the European Medicines Agency documents (Emea). The obtained results show that the proposed approach outperforms the state-of-the-art Statistical Machine Translation system Moses when texts to translate are related to the specialized domain.

[1]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[2]  Pascale Fung,et al.  Semantic Roles for SMT: A Hybrid Two-Pass Model , 2009, NAACL.

[3]  Nasredine Semmar,et al.  Using a Hybrid Word Alignment Approach for Automatic Construction and Updating of Arabic to French Lexicons , 2011 .

[4]  Geoffrey Sampson,et al.  The Oxford Handbook of Computational Linguistics , 2003, Lit. Linguistic Comput..

[5]  Jaime G. Carbonell,et al.  Context-Based Machine Translation , 2006, AMTA.

[6]  Richard M. Schwartz,et al.  Language and Translation Model Adaptation using Comparable Corpora , 2008, EMNLP.

[7]  Romaric Besançon,et al.  LIMA : A Multilingual Framework for Linguistic Analysis and Linguistic Resources Development and Evaluation , 2010, LREC.

[8]  Romaric Besançon,et al.  Concept-Based Searching and Merging for Multilingual Information Retrieval: First Experiments at CLEF 2003 , 2003, CLEF.

[9]  Dhouha Bouamor,et al.  A New Hybrid Machine Translation Approach Using Cross-Language Information Retrieval and Only Target Text Corpora , 2011 .

[10]  Gregory Grefenstette,et al.  Cross-Language Information Retrieval , 1998, The Springer International Series on Information Retrieval.

[11]  John Hutchins,et al.  Machine Translation: General Overview , 2005 .

[12]  Mark W. Davis,et al.  QUILT: implementing a large-scale cross-language text retrieval system , 1997, SIGIR '97.

[13]  Alfons Juan-Císcar,et al.  Domain Adaptation in Statistical Machine Translation with Mixture Modelling , 2007, WMT@ACL.

[14]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[15]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[16]  Andy Way,et al.  Towards Using Web-Crawled Data for Domain Adaptation in Statistical Machine Translation , 2011, EAMT.

[17]  MPhil PhD Arturo Trujillo BSc Translation Engines: Techniques for Machine Translation , 1999, Applied Computing.

[18]  Peng Xu,et al.  Improved Domain Adaptation for Statistical Machine Translation , 2012, AMTA.

[19]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[20]  William D. Lewis,et al.  Achieving Domain Specificity in SMT without Overt Siloing , 2010, LREC.

[21]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[22]  Lars Bungum,et al.  A Survey of Domain Adaptation in Machine Translation Towards a refinement of domain space , 2011 .

[23]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[24]  Marine Carpuat,et al.  Improving Statistical Machine Translation Using Word Sense Disambiguation , 2007, EMNLP.

[25]  Philipp Koehn,et al.  More Linguistic Annotation for Statistical Machine Translation , 2010, WMT@ACL.

[26]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[27]  Alex Waibel,et al.  Adaptation of the translation model for statistical machine translation based on information retrieval , 2005, EAMT.

[28]  Harold L. Somers,et al.  Machine Translation: Latest Developments , 2005 .

[29]  Philippe Langlais,et al.  Improving a general-purpose Statistical Translation Engine by Terminological lexicons , 2002, COLING 2002.

[30]  Pierre Zweigenbaum,et al.  Automatic Construction of a MultiWord Expressions Bilingual Lexicon: A Statistical Machine Translation Evaluation Perspective , 2012 .

[31]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[32]  Marcello Federico,et al.  Domain Adaptation for Statistical Machine Translation with Monolingual Resources , 2009, WMT@EACL.

[33]  Hal Daumé,et al.  Domain Adaptation for Machine Translation by Mining Unseen Words , 2011, ACL.

[34]  Andy Way,et al.  Combining Multi-Domain Statistical Machine Translation Models using Automatic Classifiers , 2010, AMTA.