论文信息 - A Statistical Model for Automatic Extraction of Korean Transliterated Foreign Words

A Statistical Model for Automatic Extraction of Korean Transliterated Foreign Words

In this paper, we will describe a Korean transliterated foreign word extraction algorithm. In the proposed method, we reformulate the foreign word extraction problem as a syllable-tagging problem such that each syllable is tagged with a foreign syllable tag or a pure Korean syllable tag. Syllable sequences of Korean strings are modelled by Hidden Markov Model whose state represents a character with binary marking to indicate whether the syllable is part of a transliterated foreign word or not. The proposed method extracts a transliterated foreign word with high recall rate and precision rate. Moreover, our method shows good performance even with small-sized training corpora.

Key-Sun Choi | Jong-Hoon Oh

[1] Keita Tsuji. Automatic Extraction of Translational Japanese-KATAKANA and English Word Pairs , 2002, Int. J. Comput. Process. Orient. Lang..

[2] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[3] Adam L. Berger,et al. A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[4] James Davidson,et al. Natural Language Understanding. , 1979 .

[5] Key-Sun Choi,et al. Two approaches for the resolution of word mismatch problem caused by English words and foreign words in Korean information retrieval , 2000, IRAL '00.

[6] Michael McGill,et al. Introduction to Modern Information Retrieval , 1983 .

[7] James F. Allen. Natural language understanding , 1987, Bejnamin/Cummings series in computer science.

[8] 이경희,et al. 한국어 문서에서 개체명 인식에 관한 연구 = Study on named entity recognition in Korean text , 2000 .

[9] Key-Sun Choi,et al. Effective foreign word extraction for Korean information retrieval , 2002, Inf. Process. Manag..

[10] Hae-Chang Rim,et al. Automatic Word Spacing Using Hidden Markov Model for Refining Korean Text Corpora , 2002, ALR@COLING.

[11] Sung Hyon Myaeng,et al. The Effect of a Proper Handling of Foreign and English Words in Retrieving Korean Text , 1997 .

[12] Jae-Seong Lee,et al. Phonetic Similarity Meausre for the Korean Transliterations of Foreign Words , 1999 .

[13] Biing-Hwang Juang,et al. Hidden Markov Models for Speech Recognition , 1991 .

[14] Key-Sun Choi,et al. Japanese term extraction using dictionary hierarchy and machine translation system , 2000 .

[15] Key-Sun Choi,et al. Automatic Extraction of Trasliterated Foreign words using HMM , 2001 .