论文信息 - Evaluating Natural Language Processing Tasks with Low Inter-Annotator Agreement: The Case of Corpus Applications

Evaluating Natural Language Processing Tasks with Low Inter-Annotator Agreement: The Case of Corpus Applications

In Low inter-annotator agreement = an ill-defined problem?, we have argued that tasks with low inter-annotator agreement are really common in natural language processing (NLP) and they deserve an appropriate attention. We have also outlined a preliminary solution for their evaluation. In On evaluation of natural language processing tasks: Is gold standard evaluation methodology a good solution? , we have agitated for extrinsic application-based evaluation of NLP tasks and against the gold standard methodology which is currently almost the only one really used in the NLP field. This paper brings a synthesis of these two: For three practical tasks, that normally have so low inter-annotator agreement that they are considered almost irrelevant to any scentific evaluation, we introduce an application-based evaluation scenario which illustrates that it is not only possible to evaluate them in a scientific way, but that this type of evaluation is much more telling than the gold standard way.

Vojtech Kovár

[1] Pavel Rychlý,et al. Low Inter-Annotator Agreement = An Ill-Defined Problem? , 2014, RASLAN.

[2] Ales Horák,et al. On Evaluation of Natural Language Processing Tasks - Is Gold Standard Evaluation Methodology a Good Solution? , 2016, ICAART.

[3] Geoffrey Sampson,et al. A proposal for improving the measurement of parse accuracy , 2000 .

[4] Adam Kilgarriff,et al. A Quantitative Evaluation of Word Sketches , 2010 .

[5] Adam Kilgarriff,et al. Extrinsic Corpus Evaluation with a Collocation Dictionary Task , 2014, LREC.

[6] Geoffrey Sampson,et al. A test of the leaf-ancestor metric for parse accuracy , 2003, Natural Language Engineering.

[7] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[8] Ralph Grishman,et al. Evaluating Parsing Strategies Using Standardized Parse Files , 1992, ANLP.

[9] Adam Kilgarriff,et al. The Sketch Engine: ten years on , 2014 .