KCAT: A Korean Corpus Annotating Tool Minimizing Human Intervention
暂无分享,去创建一个
While large POS (part-of-speech) annotated corpora play an important role in natural language processing, the annotated corpus requires very high accuracy and consistency. To build such an accurate and consistent corpus, we often use a manual tagging method. But the manual tagging is very labor intensive and expensive. Furthermore, it is not easy to get consistent results from the human experts. In this paper, we present an efficient tool for building large accurate and consistent corpora with minimal human labor. The proposed tool supports semiautomatic tagging. Using disambiguation rules acquired from human experts. it minimizes the human intervention in both the manual tagging and post-editing steps.
[1] Yves Schabes,et al. Deterministic Part-of-Speech Tagging with Finite-State Transducers , 1995, Comput. Linguistics.