Recognizing Transliteration Equivalence for Enriching Domain-Specific Thesauri

Transliteration is used to translate proper names and technical terms especially from languages in Roman alphabets to languages in non-Roman alphabets such as from English to Korean, Japanese, and Chinese. "Transliteration equivalence" refers to a set of the same words that include all possible transliterated forms and the original word. Many Korean domain-specific terms are composed of transliterations. Therefore, handling transliterations and their transliteration equivalence is essential to constructing and enriching Korean domainspecific thesauri. In this paper, we propose an algorithm recognizing transliteration equivalence or transliteration pairs in domain-specific dictionaries using machine transliteration. Machine transliteration can serve as one of components in a transliteration pair acquisition method by offering a machine-generated transliterated form. Because, transliteration pair acquisition task is to find phonetic cognate in two languages, it is important to phonetically convert words in one language to that in the other language, like machine transliteration, to compare the phonetic equivalence. Our method shows about 99% precision and 73% recall rate.