Semantic Categorization of Contextual Features Based on Wordnet for G-to-P Conversion of Arabic Numerals Combined with Homographic Classifiers

Arabic numerals show a high occurrence-frequency and deliver significant senses, especially in scientific or informative texts. The problem, how to convert Arabic numerals to phonemes with ambiguous classifiers in Korean, is not easily resolved. In this paper, the ambiguities of Arabic numerals combined with homographic classifiers are analyzed and the resolutions for their sense disambiguation based on KorLex (Korean Lexico-Semantic Network) are proposed. Words proceeding or following the Arabic Numerals are categorized into 54 semantic classes based on the lexical hierarchy in KorLex 1.0. The semantic classes are trained to classify the meaning and the reading of Arabic Numerals using a decision tree. The proposed model shows 87.3% accuracy which is 14.1% higher than the baseline.