Recurrent neural network language model for English-Indonesian Machine Translation: Experimental study

At recent time, the statistical based language model and neural based language model are still dominating the researches in the field of machine translation. The statistical based machine translation today is the fastest one but it has a weakness in term of accuracy. In contrast, the neural based network has higher accuracy but has a very slow computation process. In this research, a comparison between neural based network that adopts Recurrent Neural Network (RNN) and statistical based network with n-gram model for two-way English-Indonesian Machine Translation (MT) is conducted. The perplexity value evaluation of both models show that the use of RNN obtains a more excellent result. Meanwhile, Bilingual Evaluation Understudy (BLEU) and Rank-based Intuitive Bilingual Evaluation Score (RIBES) values increase by 1.1 and 1.6 higher than the results obtained using statistical based.

[1]  Teguh Bharata Adji,et al.  Statistical-based machine translation for prepositional phrase using Link Grammar , 2011, Proceedings of the 2011 International Conference on Electrical Engineering and Informatics.

[2]  Lukás Burget,et al.  Strategies for training large scale neural network language models , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[3]  Septina Dian Larasati Towards an Indonesian-English SMT System: A Case Study of an Under-Studied and Under-Resourced Language, Indonesian , 2012 .

[4]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[5]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[6]  Tomas Mikolov,et al.  RNNLM - Recurrent Neural Network Language Modeling Toolkit , 2011 .

[7]  Hisar Maruli Manurung,et al.  PENERJEMAHAN DOKUMEN INGGRIS-INDONESIA MENGGUNAKAN MESIN PENERJEMAH STATISTIK DENGAN WORD REORDERING DAN PHRASE REORDERING , 2009 .

[8]  Kevin Duh,et al.  Automatic Evaluation of Translation Quality for Distant Language Pairs , 2010, EMNLP.

[9]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[10]  Michael Collins,et al.  Statistical Machine Translation : IBM Models 1 and 2 , 2011 .

[11]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[12]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[13]  Donald C. Bryant,et al.  SPEECH COMMUNICATION ASSOCIATION , 1970 .

[14]  Ayu Purwarianti,et al.  Experiments on Indonesian-Japanese statistical machine translation , 2013, 2013 IEEE International Conference on Computational Intelligence and Cybernetics (CYBERNETICSCOM).

[15]  Christof Monz,et al.  Statistical Machine Translation with Local Language Models , 2011, EMNLP.

[16]  Adria de Gispert Ramis Introducing linguistic knowledge into statistical machine translation , 2007 .