论文信息 - Critical Reflections on Evaluation Practices in Coreference Resolution

Critical Reflections on Evaluation Practices in Coreference Resolution

In this paper we revisit the task of quantitative evaluation of coreference resolution systems. We review the most commonly used metrics (MUC, B, CEAF and BLANC) on the basis of their evaluation of coreference resolution in five texts from the OntoNotes corpus. We examine both the correlation between the metrics and the degree to which our human judgement of coreference resolution agrees with the metrics. In conclusion we claim that loss of information value is an essential factor, insufficiently adressed in current metrics, in human perception of the degree of success or failure of coreference resolution. We thus conjecture that including a layer of mention information weight could improve both the coreference resolution and its evaluation.

Gordana Ilic Holen

[1] Breck Baldwin,et al. Algorithms for Scoring Coreference Chains , 1998 .

[2] M. R E C A S E,et al. BLANC: Implementing the Rand index for coreference evaluation , 2010, Natural Language Engineering.

[3] Xiaoqiang Luo,et al. On Coreference Resolution Performance Metrics , 2005, HLT.

[4] Yuchen Zhang,et al. CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes , 2012, EMNLP-CoNLL Shared Task.

[5] Jian Su,et al. Coreference Resolution Using Competition Learning Approach , 2003, ACL.

[6] Pascal Denis,et al. Global joint models for coreference resolution and named entity classification , 2009, Proces. del Leng. Natural.

[7] Nianwen Xue,et al. CoNLL-2011 Shared Task: Modeling Unrestricted Coreference in OntoNotes , 2011, CoNLL Shared Task.

[8] Lynette Hirschman,et al. A Model-Theoretic Coreference Scoring Scheme , 1995, MUC.

[9] Yannick Versley,et al. SemEval-2010 Task 1: Coreference Resolution in Multiple Languages , 2009, *SEMEVAL.

[10] Xiaoqiang Luo,et al. A Mention-Synchronous Coreference Resolution Algorithm Based On the Bell Tree , 2004, ACL.

[11] Heeyoung Lee,et al. Stanford’s Multi-Pass Sieve Coreference Resolution System at the CoNLL-2011 Shared Task , 2011, CoNLL Shared Task.

[12] William M. Rand,et al. Objective Criteria for the Evaluation of Clustering Methods , 1971 .