论文信息 - Validation Methodology for Expert-Annotated Datasets: Event Annotation Case Study - 字舞流文

Validation Methodology for Expert-Annotated Datasets: Event Annotation Case Study

Event detection is still a difficult task due to the complexity and the ambiguity of such entities. On the one hand, we observe a low inter-annotator agreement among experts when annotating events, disregarding the multitude of existing annotation guidelines and their numerous revisions. On the other hand, event extraction systems have a lower measured performance in terms of F1-score compared to other types of entities such as people or locations. In this paper we study the consistency and completeness of expert-annotated datasets for events and time expressions. We propose a data-agnostic validation methodology of such datasets in terms of consistency and completeness. Furthermore, we combine the power of crowds and machines to correct and extend expert-annotated datasets of events. We show the benefit of using crowd-annotated events to train and evaluate a state-of-the-art event extraction system. Our results show that the crowd-annotated events increase the performance of the system by at least 5.3%.

Lora Aroyo | Oana Inel | Lora Aroyo | O. Inel

[1] Lora Aroyo,et al. Truth Is a Lie: Crowd Truth and the Seven Myths of Human Annotation , 2015, AI Mag..

[2] Lora Aroyo,et al. Domain-Independent Quality Measures for Crowd Truth Disagreement , 2013, DeRiVE@ISWC.

[3] Angel X. Chang,et al. SUTime: Evaluation in TempEval-3 , 2013, *SEMEVAL.

[4] Estela Saquete Boró,et al. TIPSem (English and Spanish): Evaluating CRFs and Semantic Roles in TempEval-2 , 2010, *SEMEVAL.

[5] Lora Aroyo,et al. Capturing Ambiguity in Crowdsourcing Frame Disambiguation , 2018, HCOMP.

[6] James Pustejovsky,et al. SemEval-2013 Task 1: TempEval-3: Evaluating Time Expressions, Events, and Temporal Relations , 2013, *SEMEVAL.

[7] Miao Fan,et al. Improving Event Detection with Active Learning , 2015, RANLP.

[8] Brendan T. O'Connor,et al. Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[9] Yejin Choi,et al. Event Detection and Factuality Assessment with Non-Expert Supervision , 2015, EMNLP.

[10] Michael Gertz,et al. HeidelTime: Tuning English and Developing Spanish Resources for TempEval-3 , 2013, *SEMEVAL.

[11] Ralph Grishman,et al. Using Prediction from Sentential Scope to Build a Pseudo Co-Testing Learner for Event Extraction , 2011, IJCNLP.

[12] Lora Aroyo,et al. Studying Topical Relevance with Evidence-based Crowdsourcing , 2018, CIKM.

[13] Tommaso Caselli,et al. Crowdsourcing StoryLines: Harnessing the Crowd for Causal Relation Annotation , 2018, EventStory@Coling.

[14] Sivaji Bandyopadhyay,et al. JU_CSE: A CRF Based Approach to Annotation of Temporal Expression, Event and Temporal Relations , 2013, SemEval@NAACL-HLT.

[15] Tommaso Caselli,et al. Systems' Agreements and Disagreements in Temporal Processing: An Extensive Error Analysis of the TempEval-3 Task , 2018, LREC.

[16] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[17] Marie-Francine Moens,et al. KUL: Data-driven Approach to Temporal Parsing of Newswire Articles , 2013, SemEval@NAACL-HLT.

[18] Aldo Gangemi,et al. A Comparison of Knowledge Extraction Tools for the Semantic Web , 2013, ESWC.

[19] Ujwal Gadiraju,et al. JustEvents: A Crowdsourced Corpus for Event Validation with Strict Temporal Constraints , 2017, ECIR.

[20] James Pustejovsky,et al. Temporal and Event Information in Natural Language Text , 2005, Lang. Resour. Evaluation.

[21] Alessandro Lenci,et al. Crowdsourcing for the identification of event nominals: an experiment , 2014, LREC.

[22] Wolfgang Lehner,et al. Enhancing Named Entity Extraction by Effectively Incorporating the Crowd , 2013, BTW Workshops.

[23] Chantal van Son,et al. Resource Interoperability for Sustainable Benchmarking: The Case of Events , 2018, LREC.

[24] Nate Chambers. NavyTime: Event and Time Ordering from Raw Text , 2013, SemEval@NAACL-HLT.

[25] Steven Bethard,et al. ClearTK-TimeML: A minimalist approach to TempEval 2013 , 2013, *SEMEVAL.

[26] Munirathnam Srikanth,et al. LCC-TE: A Hybrid Approach to Temporal Relation Identification in News Text , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[27] Lora Aroyo,et al. CrowdTruth 2.0: Quality Metrics for Crowdsourcing with Disagreement (short paper) , 2018, SAD/CrowdBias@HCOMP.

[28] Tommaso Caselli,et al. Temporal Information Annotation: Crowd vs. Experts , 2016, LREC.

[29] Amanda Stent,et al. ATT1: Temporal Annotation Using Big Windows and Rich Syntactic and Semantic Features , 2013, SemEval@NAACL-HLT.

[30] Gianluca Demartini,et al. Hybrid human-machine information systems: Challenges and opportunities , 2015, Comput. Networks.

[31] Lora Aroyo,et al. Harnessing Diversity in Crowds and Machines for Better NER Performance , 2017, ESWC.