Active Learning for Coreference Resolution using Discrete Annotation

We improve upon pairwise annotation for active learning in coreference resolution, by asking annotators to identify mention antecedents if a presented mention pair is deemed not coreferent. This simple modification, when combined with a novel mention clustering algorithm for selecting which examples to label, is much more efficient in terms of the performance obtained per annotation budget. In experiments with existing benchmark coreference datasets, we show that the signal from this additional question leads to significant performance gains per human-annotation hour. Future work can use our annotation protocol to effectively develop coreference models for new domains. Our code is publicly available at this https URL .

[1]  Yuchen Zhang,et al.  CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes , 2012, EMNLP-CoNLL Shared Task.

[2]  Andrew Rosenberg,et al.  Supervised and unsupervised active learning for automatic speech recognition of low-resource languages , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Luke S. Zettlemoyer,et al.  AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.

[4]  Noah A. Smith,et al.  Quoref: A Reading Comprehension Dataset with Questions Requiring Coreferential Reasoning , 2019, EMNLP.

[5]  Noah A. Smith,et al.  Evaluating Gender Bias in Machine Translation , 2019, ACL.

[6]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[7]  Luke S. Zettlemoyer,et al.  End-to-end Neural Coreference Resolution , 2017, EMNLP.

[8]  Caroline Gasperin,et al.  Active Learning for Anaphora Resolution , 2009, HLT-NAACL 2009.

[9]  Jason Baldridge,et al.  Learning a Part-of-Speech Tagger from Two Hours of Annotation , 2013, NAACL.

[10]  Hinrich Schütze,et al.  Active Learning for Coreference Resolution , 2012, NAACL.

[11]  Michael I. Mandel,et al.  Active learning for low-resource speech recognition: Impact of selection size and language modeling data , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Eric P. Xing,et al.  An Active Learning Approach to Coreference Resolution , 2015, IJCAI.

[13]  Hwee Tou Ng,et al.  Domain Adaptation with Active Learning for Coreference Resolution , 2014, Louhi@EACL.

[14]  Anthony N. Nguyen,et al.  Active learning: a step towards automating medical concept extraction , 2015, J. Am. Medical Informatics Assoc..