Beyond Pooling

Dynamic Sampling is a novel, non-uniform, statistical sampling strategy in which documents are selected for relevance assessment based on the results of prior assessments. Unlike static and dynamic pooling methods that are commonly used to compile relevance assessments for the creation of information retrieval test collections, Dynamic Sampling yields a statistical sample from which substantially unbiased estimates of effectiveness measures may be derived. In contrast to static sampling strategies, which make no use of relevance assessments, Dynamic Sampling is able to select documents from a much larger universe, yielding superior test collections for a given budget of relevance assessments. These assertions are supported by simulation studies using secondary data from the TREC 2017 Common Core Track.

[1]  J. Aslam,et al.  A Practical Sampling Strategy for Efficient Retrieval Evaluation , 2007 .

[2]  Maura R. Grossman,et al.  Autonomy and Reliability of Continuous Active Learning for Technology-Assisted Review , 2015, ArXiv.

[3]  Virgil Pavlu,et al.  Large Scale IR Evaluation. , 2008 .

[4]  Mark Sanderson,et al.  Test Collection Based Evaluation of Information Retrieval Systems , 2010, Found. Trends Inf. Retr..

[5]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Robust Track. , 2004 .

[6]  Stephen E. Robertson,et al.  A new rank correlation coefficient for information retrieval , 2008, SIGIR '08.

[7]  Maura R. Grossman,et al.  Automatic and Semi-Automatic Document Selection for Technology-Assisted Review , 2017, SIGIR.

[8]  Ellen M. Voorhees,et al.  The TREC 2005 robust track , 2006, SIGF.

[9]  Maura R. Grossman,et al.  Scalability of Continuous Active Learning for Reliable High-Recall Text Classification , 2016, CIKM.

[10]  Mark Sanderson,et al.  Forming test collections with no system pooling , 2004, SIGIR '04.

[11]  Stephen E. Robertson,et al.  Building a filtering test collection for TREC 2002 , 2003, SIGIR.

[12]  David E. Losada,et al.  Feeling lucky?: multi-armed bandits for ordering judgements in pooling-based evaluation , 2016, SAC.

[13]  James Allan,et al.  Minimal test collections for retrieval evaluation , 2006, SIGIR.

[14]  James Allan,et al.  TREC 2017 Common Core Track Overview , 2017, TREC.

[15]  Emine Yilmaz,et al.  A simple and efficient sampling method for estimating AP and NDCG , 2008, SIGIR '08.

[16]  Charles L. A. Clarke,et al.  Efficient construction of large test collections , 1998, SIGIR '98.

[17]  Maura R. Grossman,et al.  TREC 2016 Total Recall Track Overview , 2016, TREC.

[18]  Maura R. Grossman,et al.  Multi-Faceted Recall of Continuous Active Learning for Technology-Assisted Review , 2015, SIGIR.