论文信息 - A Japanese Particle Corpus Built by Example-Based Annotation

A Japanese Particle Corpus Built by Example-Based Annotation

This paper is a report on an on-going project of creating a new corpus focusing on Japanese particles. The corpus will provide deeper syntactic/semantic information than the existing resources. The initial target particle is ``to'' which occurs 22,006 times in 38,400 sentences of the existing corpus: the Kyoto Text Corpus. In this annotation task, an ``example-based'' methodology is adopted for the corpus annotation, which is different from the traditional annotation style. This approach provides the annotators with an example sentence rather than a linguistic category label. By avoiding linguistic technical terms, it is expected that any native speakers, with no special knowledge on linguistic analysis, can be an annotator without long training, and hence it can reduce the annotation cost. So far, 10,475 occurrences have been already annotated, with an inter-annotator agreement of 0.66 calculated by Cohen's kappa. The initial disagreement analyses and future directions are discussed in the paper.

Hideki Mima | Jun'ichi Tsujii | Hiroki Hanaoka

[1] Makoto Nagao,et al. Building a Japanese parsed corpus while improving the parsing system , 1997 .

[2] J. Fleiss. Measuring nominal scale agreement among many raters. , 1971 .

[3] Yuji Matsumoto,et al. Annotating a Japanese Text Corpus with Predicate-Argument and Coreference Relations , 2007, LAW@ACL.

[4] Jacob Cohen. A Coefficient of Agreement for Nominal Scales , 1960 .

[5] Yusuke Miyao,et al. From Linguistic Theory to Syntactic Analysis : Corpus-Oriented Grammar Development and Feature Forest Model , 2006 .