Optimizing Nugget Annotations with Active Learning

Nugget-based evaluations, such as those deployed in the TREC Temporal Summarization and Question Answering tracks, require human assessors to determine whether a nugget is present in a given piece of text. This process, known as nugget annotation, is labor-intensive. In this paper, we present two active learning techniques that prioritize the sequence in which candidate nugget/sentence pairs are presented to an assessor, based on the likelihood that the sentence contains a nugget. Our approach builds on the recognition that nugget annotation is similar to high-recall retrieval, and we adapt proven existing solutions. Simulation experiments with four existing TREC test collections show that our techniques yield far more matches for a given level of effort than baselines that are typically deployed in previous nugget-based evaluations.

[1]  M. de Rijke,et al.  Untangling Result List Refinement and Ranking Quality: a Framework for Evaluation and Prediction , 2015, SIGIR.

[2]  Fabian Steeg,et al.  Information-Retrieval: Evaluation , 2010 .

[3]  James Allan,et al.  HARD Track Overview in TREC 2003: High Accuracy Retrieval from Documents , 2003, TREC.

[4]  Ellen M. Voorhees,et al.  TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing) , 2005 .

[5]  Jimmy J. Lin,et al.  Automatically Evaluating Answers to Definition Questions , 2005, HLT.

[6]  Maura R. Grossman,et al.  Autonomy and Reliability of Continuous Active Learning for Technology-Assisted Review , 2015, ArXiv.

[7]  Javed A. Aslam,et al.  IR system evaluation using nugget-based test collections , 2012, WSDM '12.

[8]  Javed A. Aslam,et al.  Constructing test collections by inferring document relevance via extracted relevant information , 2012, CIKM.

[9]  Mathias Schlögl,et al.  Time Well Spent. , 2018, Journal of palliative medicine.

[10]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Novelty Track. , 2005 .

[11]  Maura R. Grossman,et al.  TREC 2016 Total Recall Track Overview , 2016, TREC.

[12]  Mark D. Smucker,et al.  The effect of expanding relevance judgements with duplicates , 2014, SIGIR.

[13]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[14]  Maura R. Grossman,et al.  Evaluation of machine-learning protocols for technology-assisted review in electronic discovery , 2014, SIGIR.

[15]  Alexey Radul,et al.  Nuggeteer: Automatic Nugget-Based Evaluation using Descriptions and Judgements , 2006, NAACL.

[16]  Maura R. Grossman,et al.  Engineering Quality and Reliability in Technology-Assisted Review , 2016, SIGIR.

[17]  Tetsuya Sakai,et al.  TREC 2013 Temporal Summarization , 2013, TREC.

[18]  Guido Zuccon,et al.  Building and Using Models of Information Seeking, Search and Retrieval: Full Day Tutorial , 2015, SIGIR.

[19]  James Allan,et al.  HARD Track Overview in TREC 2004 (Notebook) High Accuracy Retrieval from Documents , 2004 .