论文信息 - Semi-Automatic Annotation Tool to Build Large Dependency Tree-Tagged Corpus

Semi-Automatic Annotation Tool to Build Large Dependency Tree-Tagged Corpus

Corpora annotated with lots of linguistic information are required to develop robust and statistical natural language processing systems. Building such corpora, however, is an expensive, labor-intensive, and time-consuming work. To help the work, we design and implement an annotation tool for establishing a Korean dependency tree-tagged corpus. Compared with other annotation tools, our tool is characterized by the following features: independence of applications, localization of errors, powerful error checking, instant annotated information sharing, user-friendly. Using our tool, we have annotated 100,904 Korean sentences with dependency structures. The number of annotators is 33, the average annotation time is about 4 minutes per sentence, and the total period of the annotation is 5 months. We are confident that we can have accurate and consistent annotations as well as reduced labor and time.

Chang-Hyun Kim | Jae-Hoon Kim | Eun-Jin Park | Young Kil Kim

[1] Eric Brill,et al. Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[2] Yun Seo Jung. A Right-to-Left Chart Parser using Headable Paths , 1994 .

[3] Lynette Hirschman,et al. Mixed-Initiative Development of Language Processing Systems , 1997, ANLP.

[4] Grace Ngai,et al. Transformation Based Learning in the Fast Lane , 2001, NAACL.

[5] Jean Carletta,et al. A generic approach to software support for linguistic annotation using XML , 2005 .

[6] Kemal Oflazer,et al. The Annotation Process in the Turkish Treebank , 2003, LINC@EACL.

[7] Thorsten Brants,et al. Interactive Corpus Annotation , 2000, LREC.

[8] Thomas S. Morton,et al. WordFreak: An Open Tool for Linguistic Annotation , 2003, HLT-NAACL.

[9] Ann Bies,et al. The Penn Treebank: Annotating Predicate Argument Structure , 1994, HLT.