Computer-Assisted Relevance Assessment: A Case Study of Updating Systematic Medical Reviews

It is becoming more challenging for health professionals to keep up to date with current research. To save time, many experts perform evidence syntheses on systematic reviews instead of primary studies. Subsequently, there is a need to update reviews to include new evidence, which requires a significant amount of effort and delays the update process. These efforts can be significantly reduced by applying computer-assisted techniques to identify relevant studies. In this study, we followed a “human-in-the-loop” approach by engaging medical experts through a controlled user experiment to update systematic reviews. The primary outcome of interest was to compare the performance levels achieved when judging full abstracts versus single sentences accompanied by Natural Language Inference labels. The experiment included post-task questionnaires to collect participants’ feedback on the usability of the computer-assisted suggestions. The findings lead us to the conclusion that employing sentence-level, for relevance assessment, achieves higher recall.

[1]  Mark D. Smucker,et al.  Evaluating sentence-level relevance feedback for high-recall information retrieval , 2018, Inf. Retr. J..

[2]  Dawid Pieper,et al.  Up-to-dateness of reviews is often neglected in overviews: a systematic review. , 2014, Journal of clinical epidemiology.

[3]  Mark D. Smucker,et al.  Effective User Interaction for High-Recall Retrieval: Less is More , 2018, CIKM.

[4]  Catherine L. Smith,et al.  User adaptation: good results from poor systems , 2008, SIGIR '08.

[5]  Mark D. Smucker,et al.  Human performance and retrieval precision revisited , 2010, SIGIR.

[6]  James R. Lewis,et al.  IBM computer usability satisfaction questionnaires: Psychometric evaluation and instructions for use , 1995, Int. J. Hum. Comput. Interact..

[7]  Marco Spruit,et al.  UU_TAILS at MEDIQA 2019: Learning Textual Entailment in the Medical Domain , 2019, BioNLP@ACL.

[8]  Jerome A Osheroff,et al.  Research Paper: Answering Physicians' Clinical Questions: Obstacles and Potential Solutions , 2005, J. Am. Medical Informatics Assoc..

[9]  A. Dunn,et al.  Time-to-update of systematic reviews relative to the availability of new evidence , 2018, Systematic Reviews.

[10]  Guido Zuccon,et al.  Impact of a Search Engine on Clinical Decisions Under Time and System Effectiveness Constraints: Research Protocol , 2019, JMIR research protocols.

[11]  Stefano Mizzaro Relevance: the whole history , 1997 .

[12]  J. Janes Other people's judgments: a comparison of users' and others' judgments of document relevance, topicality, and utility , 1994 .

[13]  Jessica Massonnié,et al.  Gorilla in our midst: An online behavioral experiment builder , 2019, Behavior research methods.

[14]  Lynda Tamine,et al.  On the impact of domain expertise on query formulation, relevance assessment and retrieval performance in clinical settings , 2017, Inf. Process. Manag..

[15]  K. Shojania,et al.  How Quickly Do Systematic Reviews Go Out of Date? A Survival Analysis , 2007, Annals of Internal Medicine.

[16]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[17]  Mark D. Smucker,et al.  Time-Limits and Summaries for Faster Relevance Assessing , 2019, SIGIR.

[18]  Mark Sanderson,et al.  Accurate user directed summarization from existing tools , 1998, CIKM '98.