Indonesian-Japanese CLIR Using Only Limited Resource

Our research aim here is to build a CLIR system that works for a language pair with poor resources where the source language (e.g. Indonesian) has limited language resources. Our Indonesian-Japanese CLIR system employs the existing Japanese IR system, and we focus our research on the Indonesian-Japanese query translation. There are two problems in our limited resource query translation: the OOV problem and the translation ambiguity. The OOV problem is handled using target language's resources (English-Japanese dictionary and Japanese proper name dictionary). The translation ambiguity is handled using a Japanese monolingual corpus in our translation filtering. We select the final translation set using the mutual information score and the TFxIDF score. The result on NTCIR 3 (NII-NACSIS Test Collection for IR Systems) Web Retrieval Task shows that the translation method achieved a higher IR score than the transitive machine translation (using Kataku (Indonesian-English) and Babelfish/ Excite (English-Japanese) engine) result. The best result achieved about 49% of the monolingual retrieval.

[1]  Gregory Grefenstette,et al.  Resolving Translation Ambiguity using Monolingual Corpora. A Report on Clairvoyance CLEF-2002 Experiments , 2002, CLEF.

[2]  Mirna Adriani Using Statistical Term Similarity for Sense Disambiguation in Cross-Language Information Retrieval , 2004, Information Retrieval.

[3]  W. Bruce Croft,et al.  Resolving ambiguity for cross-language retrieval , 1998, SIGIR '98.

[4]  Katunobu Itou,et al.  Building a test collection for speech-driven web retrieval , 2003, INTERSPEECH.

[5]  Tetsuya Ishikawa,et al.  NTCIR-3 Cross-Language IR Experiments at ULIS , 2002, NTCIR.

[6]  Lisa Ballesteros,et al.  Cross-Language Retrieval via Transitive Translation , 2002 .

[7]  Marcello Federico,et al.  Statistical cross-language information retrieval using n-best query translations , 2002, SIGIR '02.

[8]  Changning Huang,et al.  Improving query translation for cross-language information retrieval using statistical models , 2001, SIGIR '01.

[9]  Makoto Ishida Intelligent Human Sensing , 2005 .

[10]  Hsin-Hsi Chen,et al.  Overview of CLIR Task at the Third NTCIR Workshop , 2002, NTCIR.

[11]  Hsin-Hsi Chen,et al.  Overview of CLIR Task at the Fourth NTCIR Workshop , 2004, NTCIR.

[12]  Mark Sanderson,et al.  Improving cross language retrieval with triangulated translation , 2001, SIGIR '01.

[13]  Noriko Kando,et al.  Two-Stage Refinement of Query Translation in a Pivot Language Approach to Cross-Lingual Information Retrieval: An Experiment at CLEF 2003 , 2003, CLEF.

[14]  Christopher J. Fox,et al.  A stop list for general text , 1989, SIGF.

[15]  Kumiko Tanaka-Ishii,et al.  Construction of a Bilingual Dictionary Intermediated by a Third Language , 1994, COLING.

[16]  Mark Sanderson,et al.  Improving Cross Language Information Retrieval with Triangulated Translation. , 2001, SIGIR 2002.