论文信息 - Development of Indonesian-Japanese statistical machine translation using lemma translation and additional post-process

Development of Indonesian-Japanese statistical machine translation using lemma translation and additional post-process

Despite the fact that study of statistical machine translation has been growing rapidly to date, there has not been much research done about Indonesian-Japanese statistical machine translation. The previous research about Indonesian-Japanese statistical machine translation has shown several problems in translation process, such as low coverage corpus data, unknown words, and sentence reordering problem. In this research, we propose two methods to address these problems. The proposed methods are lemma translation with generated surface form and additional post-process. Lemma translation uses lemma and POSTAG of word in its translation process. Rule based katakana translation and unknown word substitution are also used for additional post-process. Experimental data was collected from JLPT (Japanese Language Proficiency Test) Level 3 with total 1132 sentences. Experimental results using these methods showed an improvement over the baseline system with a 116% increased BLEU score on Japanese to Indonesian translation and 26% increased BLEU score on Indonesian to Japanese translation.

Ayu Purwarianti | Mohammad Anugrah Sulaeman | A. Purwarianti

[1] Roberta,et al. Corpus Linguistics 25 Years on. , 2007 .

[2] Miles Osborne,et al. Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[3] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[4] P. Kaszubski. Corpora in Applied Linguistics , 2003 .

[5] Jan Svartvik. Corpus linguistics 25+years on , 2007 .

[6] Mirna Adriani,et al. Developing Indonesian-English Hybrid Machine Translation system , 2011, 2011 International Conference on Advanced Computer Science and Information Systems.

[7] José Clemente. Architecture and modeling for n-gram-based statistical machine translation , 2008 .

[8] Philipp Koehn,et al. Improved Statistical Machine Translation Using Paraphrases , 2006, NAACL.

[9] Christopher D. Manning,et al. A Simple and Effective Hierarchical Phrase Reordering Model , 2008, EMNLP.

[10] Mauro Cettolo,et al. IRSTLM: an open source toolkit for handling large scale language models , 2008, INTERSPEECH.

[11] R. Hema,et al. Statistical Machine Translation System , 2009 .