Seed-driven Document Ranking for Systematic Reviews in Evidence-Based Medicine

Systematic review (SR) in evidence-based medicine is a literature review which provides a conclusion to a specific clinical question. To assure credible and reproducible conclusions, SRs are conducted by well-defined steps. One of the key steps, the screening step, is to identify relevant documents from a pool of candidate documents. Typically about 2000 candidate documents will be retrieved from databases using keyword queries for a SR. From which, about 20 relevant documents are manually identified by SR experts, based on detailed relevance conditions or eligibility criteria. Recent studies show that document ranking, or screening prioritization, is a promising way to improve the manual screening process. In this paper, we propose a seed-driven document ranking (SDR) model for effective screening, with the assumption that one relevant document is known, i.e., the seed document. Based on a detailed analysis of characteristics of relevant documents, SDR represents documents using bag of clinical terms, rather than the commonly used bag of words. More importantly, we propose a method to estimate the importance of the clinical terms based on their distribution in candidate documents. On benchmark dataset released by CLEF'17 eHealth Task 2, we show that the proposed SDR outperforms state-of-the-art solutions. Interestingly, we also observe that ranking based on word embedding representation of documents well complements SDR. The best ranking is achieved by combining the relevances estimated by SDR and by word embedding. Additionally, we report results of simulating the manual screening process with SDR.

[1]  M. Leeflang,et al.  Procalcitonin, C-reactive protein, and erythrocyte sedimentation rate for the diagnosis of acute pyelonephritis in children. , 2015, The Cochrane database of systematic reviews.

[2]  W. Bruce Croft,et al.  Improving Patent Search by Search Result Diversification , 2015, ICTIR.

[3]  S. Ananiadou,et al.  Using text mining for study identification in systematic reviews: a systematic review of current approaches , 2015, Systematic Reviews.

[4]  Luca Soldaini QuickUMLS: a fast, unsupervised approach for medical concept extraction , 2016 .

[5]  Aurélien Garivier,et al.  On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[6]  ChengXiang Zhai,et al.  Text Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining , 2016 .

[7]  Bhaskar Mitra,et al.  Improving Document Ranking with Dual Word Embeddings , 2016, WWW.

[8]  Byron C. Wallace,et al.  RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials , 2015, J. Am. Medical Informatics Assoc..

[9]  Matthew Lease,et al.  Systematic Review is e-Discovery in Doctor ’ s Clothing , 2016 .

[10]  Mark Stevenson,et al.  Ranking Abstracts to Identify Relevant Evidence for Systematic Reviews: The University of Sheffield's Approach to CLEF eHealth 2017 Task 2 , 2017, CLEF.

[11]  Laurence T. Yang,et al.  Query by document via a decomposition-based two-level retrieval approach , 2011, SIGIR.

[12]  Maura R. Grossman,et al.  Technology-Assisted Review in Empirical Medicine: Waterloo Participation in CLEF eHealth 2017 , 2017, CLEF.

[13]  Zhaohui Zheng,et al.  Learning to model relatedness for news recommendation , 2011, WWW.

[14]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[15]  W. Bruce Croft,et al.  Automatic suggestion of phrasal-concept queries for literature search , 2014, Inf. Process. Manag..

[16]  W. Bruce Croft,et al.  Diversifying query suggestions based on query documents , 2014, SIGIR.

[17]  Maura R. Grossman,et al.  Engineering Quality and Reliability in Technology-Assisted Review , 2016, SIGIR.

[18]  Y. Takwoingi,et al.  Ultrasound versus liver function tests for diagnosis of common bile duct stones. , 2015, The Cochrane database of systematic reviews.

[19]  William R. Hersh,et al.  Reducing workload in systematic review preparation using automated citation classification. , 2006, Journal of the American Medical Informatics Association : JAMIA.

[20]  Sophia Ananiadou,et al.  Topic detection using paragraph vectors to support active learning in systematic reviews , 2016, J. Biomed. Informatics.

[21]  Byron C. Wallace,et al.  Extracting PICO Sentences from Clinical Trial Reports using Supervised Distant Supervision , 2016, J. Mach. Learn. Res..

[22]  Les Irwig,et al.  Red flags to screen for vertebral fracture in patients presenting with low-back pain. , 2023, The Cochrane database of systematic reviews.

[23]  Guido Zuccon,et al.  A Test Collection for Evaluating Retrieval of Studies for Inclusion in Systematic Reviews , 2017, SIGIR.

[24]  Yin Yang,et al.  Query by document , 2009, WSDM '09.

[25]  Siddhartha R. Jonnalagadda,et al.  Automating data extraction in systematic reviews: a systematic review , 2015, Systematic Reviews.

[26]  Carla E. Brodley,et al.  Active learning for biomedical citation screening , 2010, KDD.

[27]  G. Casazza,et al.  Capsule endoscopy for the diagnosis of oesophageal varices in people with chronic liver disease or portal vein thrombosis. , 2014, The Cochrane database of systematic reviews.

[28]  Y. Takwoingi,et al.  Diagnostic accuracy of laparoscopy following computed tomography (CT) scanning for assessing the resectability with curative intent in pancreatic and periampullary cancer. , 2013, The Cochrane database of systematic reviews.

[29]  Guido Zuccon,et al.  Integrating the Framing of Clinical Questions via PICO into the Retrieval of Medical Literature for Systematic Reviews , 2017, CIKM.

[30]  Leif Azzopardi,et al.  CLEF 2018 Technologically Assisted Reviews in Empirical Medicine Overview , 2018, CLEF.