Ranking and Feedback-based Stopping for Recall-Centric Document Retrieval

Systematic reviews require researchers to identify the entire body of relevant literature. Algorithms that filter the list for manual scanning with nearly perfect recall can significantly decrease the workload. This paper presents a novel stopping criterion that estimates the score-distribution of relevant articles from relevance feedback of random articles (S-D Minimal Sampling). Using 20 training and 30 test topics, we achieve a mean recall of 93.3%, filtering out 59.1% of the articles. This approach achieves higher F2-Scores at significantly reduced manual reviewing work loads. The method is especially suited for scenarios with sufficiently many relevant articles (>5) that can be sampled and employed for relevance feedback.

[1]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[2]  Brian E. Howard,et al.  SWIFT-Review: a text-mining workbench for systematic review , 2016, Systematic Reviews.

[3]  Leif Azzopardi,et al.  CLEF 2018 Technologically Assisted Reviews in Empirical Medicine Overview , 2018, CLEF.

[4]  H. O. Hartley,et al.  Biometrika Tables for Statisticians, Vol. 2 , 1973 .

[5]  Hang Li,et al.  AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.

[6]  Qiang Wu,et al.  Adapting boosting for information retrieval measures , 2010, Information Retrieval.

[7]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[8]  Guido Zuccon,et al.  CLEF 2017 eHealth Evaluation Lab Overview , 2017, CLEF.

[9]  Stephen E. Robertson,et al.  Where to stop reading a ranked list?: threshold optimization using truncated score distributions , 2009, SIGIR.

[10]  Henry Petersen,et al.  Increased Workload for Systematic Review Literature Searches of Diagnostic Tests Compared With Treatments: Challenges and Opportunities , 2014, JMIR medical informatics.

[11]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[12]  H. Bastian,et al.  Seventy-Five Trials and Eleven Systematic Reviews a Day: How Will We Ever Keep Up? , 2010, PLoS medicine.

[13]  Sam Vincent,et al.  Clinical Evidence diagnosis: Developing a sensitive search strategy to retrieve diagnostic studies on deep vein thrombosis: a pragmatic approach. , 2003, Health information and libraries journal.