KUNLP System for NTCIR-4 Korean-English Cross-Language Information Retrieval
This paper describes our Korean-English crosslanguage information retrieval system for NTCIR-4. Our system is based on a query translation approach with a bilingual dictionary and co-occurrence information between English terms in English corpus. In this year, we have focused on translation of unknown words. We have expanded the existing bilingual dictionary by gathering some of the Korean-English translation pairs for Korean words from Web manually. For other unknown not contained in the expanded bilingual dictionary, we automatically transliterated into English using pre-constructed mapping table. Some issues for processing Korean queries and documents are also described, such as identification of Korean phrases. On evaluation collections for NTCIR-4, performance of our system is 30.25% for description query type, 33.33% for title query type, and 32.47% for combination query type of description and narrative in relax scoring. Post-submission experiments show that our expanded dictionary and transliteration mechanism improve the performance of our system.
[1] Alexander M. Fraser,et al. TREC 2001 Cross-lingual Retrieval at BBN , 2001, TREC.
[2] Jinxi Xu,et al. TREC-9 Cross-lingual Retrieval at BBN , 2000, TREC.
[3] Stephen E. Robertson,et al. Okapi/Keenbow at TREC-8 , 1999, TREC.
[4] Yi Su,et al. TREC-9 CLIR Experiments at MSRCN , 2000, TREC.
[5] Hae-Chang Rim,et al. An Efficient Method for Korean Noun Extraction Using Noun Occurrence Characteristics , 2001, NLPRS.