An Automatic Code Classification System by Using Memory-Based Learning and Information Retrieval Technique
暂无分享,去创建一个
This paper proposes an automatic code classification for Korean census data by using information retrieval technique and memoory-based learning technique. The purpose of the proposed system is to convert natural language responses on survey questionnaires into corresponding numeric codes according to standard code book from the Census Bureau. The system was trained by memory based learning and experimented with 46,762 industry records and occupation 36,286 records. It was evaluated by using 10-fold cross-validation method. As experimental results, the proposed system showed 99.10% and 92.88% production rates for level 2 and level 5 codes respectively.
[1] Great Britain. Foreign Office.,et al. Classification of occupations , 1960 .
[2] Michael McGill,et al. Introduction to Modern Information Retrieval , 1983 .
[3] David L. Waltz,et al. Trading MIPS and memory for knowledge engineering , 1992, CACM.