论文信息 - KCAT: A Korean Corpus Annotating Tool Minimizing Human Intervention

KCAT: A Korean Corpus Annotating Tool Minimizing Human Intervention

While large POS (part-of-speech) annotated corpora play an important role in natural language processing, the annotated corpus requires very high accuracy and consistency. To build such an accurate and consistent corpus, we often use a manual tagging method. But the manual tagging is very labor intensive and expensive. Furthermore, it is not easy to get consistent results from the human experts. In this paper, we present an efficient tool for building large accurate and consistent corpora with minimal human labor. The proposed tool supports semiautomatic tagging. Using disambiguation rules acquired from human experts. it minimizes the human intervention in both the manual tagging and post-editing steps.

Hae-Chang Rim | Jin-Dong Kim | Heui-Seok Lim | Won-Ho Ryu

[1] Yves Schabes,et al. Deterministic Part-of-Speech Tagging with Finite-State Transducers , 1995, Comput. Linguistics.