Bilingual Language Model for English Arabic Technical Translation

The massive fast of new scientific publications increase the need to a reliable effective automatic machine translation (AMT) system, which translates from English, as the common language of publications, to other different languages. Statistical machine translation (SMT) model crafted to deal with certain domain of text often fails when subjected to another domain. The paper addresses the characterization of language domains and their behavior in SMT, experiments the management of SMT model to translate scientific text collected from artificial intelligence publications. The effectiveness of Bilingual language model is tested against the typical N-gram language model, in addition to utilizing the fill-up and back-off techniques to handle different phrase tables from different domains. As not every human capable to translate artificial intelligence book, should have strong knowledge in the field, We suggest that in order AMT can handle different domains it must be trained by in-domain parallel data, adjusting weights for the words on different domains to learn the model how to differentiate between different meaning of same word in different domains.

[1]  Marine Carpuat,et al.  Improving Statistical Machine Translation Using Word Sense Disambiguation , 2007, EMNLP.

[2]  Kevin Knight Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics , 2005 .

[3]  Panayiotis G. Georgiou,et al.  Selecting relevant text subsets from web-data for building topic specific language models , 2006, NAACL.

[4]  Arianna Bisazza,et al.  Fill-up versus interpolation methods for phrase-based SMT adaptation , 2011, IWSLT.

[5]  Srinivas Bangalore,et al.  Reducing the Impact of Data Sparsity in Statistical Machine Translation , 2014, SSST@EMNLP.

[6]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[7]  Barry Haddow,et al.  Improved Minimum Error Rate Training in Moses , 2009, Prague Bull. Math. Linguistics.

[8]  Stefan Besling,et al.  Language model speaker adaptation , 1995, EUROSPEECH.

[9]  Rico Sennrich,et al.  Perplexity Minimization for Translation Model Domain Adaptation in Statistical Machine Translation , 2012, EACL.

[10]  Bing Zhao,et al.  Bilingual Recurrent Neural Networks for improved statistical machine translation , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[11]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[12]  Jan Niehues,et al.  Wider Context by Using Bilingual Language Models in Machine Translation , 2011, WMT@EMNLP.

[13]  Christof Monz,et al.  Dependency-Based Bilingual Language Models for Reordering in Statistical Machine Translation , 2014, EMNLP.

[14]  Preslav Nakov,et al.  Improving English-Spanish Statistical Machine Translation: Experiments in Domain Adaptation, Sentence Paraphrasing, Tokenization, and Recasing , 2008, WMT@ACL.

[15]  Roland Kuhn,et al.  Mixture-Model Adaptation for SMT , 2007, WMT@ACL.

[16]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[17]  Yaser Al-Onaizan,et al.  Distortion Models for Statistical Machine Translation , 2006, ACL.

[18]  Jussi Karlgren,et al.  Recognizing Text Genres With Simple Metrics Using Discriminant Analysis , 1994, COLING.

[19]  Roland Kuhn,et al.  Vector Space Model for Adaptation in Statistical Machine Translation , 2013, ACL.

[20]  José B. Mariño,et al.  N-gram-based Machine Translation , 2006, CL.

[21]  Timothy Baldwin,et al.  Word Sense Induction for Novel Sense Detection , 2012, EACL.

[22]  Andy Way,et al.  A probabilistic feature-based fill-up for SMT , 2014, AMTA.

[23]  Richard M. Schwartz,et al.  Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.

[24]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[25]  Philipp Koehn,et al.  Experiments in Domain Adaptation for Statistical Machine Translation , 2007, WMT@ACL.

[26]  Martin Volk,et al.  Evaluating MT with translations or translators: what is the difference? , 2007 .

[27]  Richard M. Schwartz,et al.  Language Model Adaptation in Machine Translation from Speech , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[28]  Philipp Koehn,et al.  Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[29]  Yiming Wang,et al.  Factored Statistical Machine Translation for Grammatical Error Correction , 2014, CoNLL Shared Task.

[30]  Rachel Rudinger,et al.  SenseSpotting: Never let your parallel data tie you to an old domain , 2013, ACL.

[31]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[32]  Jianfeng Gao,et al.  Domain Adaptation via Pseudo In-Domain Data Selection , 2011, EMNLP.