Creating enriched training sets of eligible studies for large systematic reviews: the utility of PubMed's Best Match algorithm

Abstract Introduction Solutions like crowd screening and machine learning can assist systematic reviewers with heavy screening burdens but require training sets containing a mix of eligible and ineligible studies. This study explores using PubMed's Best Match algorithm to create small training sets containing at least five relevant studies. Methods Six systematic reviews were examined retrospectively. MEDLINE searches were converted and run in PubMed. The ranking of included studies was studied under both Best Match and Most Recent sort conditions. Results Retrieval sizes for the systematic reviews ranged from 151 to 5,406 records and the numbers of relevant records ranged from 8 to 763. The median ranking of relevant records was higher in Best Match for all six reviews, when compared with Most Recent sort. Best Match placed a total of thirty relevant records in the first fifty, at least one for each systematic review. Most Recent sorting placed only ten relevant records in the first fifty. Best Match sorting outperformed Most Recent in all cases and placed five or more relevant records in the first fifty in three of six cases. Discussion Using a predetermined set size such as fifty may not provide enough true positives for an effective systematic review training set. However, screening PubMed records ranked by Best Match and continuing until the desired number of true positives are identified is efficient and effective. Conclusions The Best Match sort in PubMed improves the ranking and increases the proportion of relevant records in the first fifty records relative to sorting by recency.

[1]  Nassr Nama,et al.  Quality Control for Crowdsourcing Citation Screening: The Importance of Assessment Number and Qualification Set Size. , 2020, Journal of clinical epidemiology.

[2]  M. Sampson,et al.  Identification and Evaluation of Controlled Trials in Pediatric Cardiology: Crowdsourced Scoping Review and Creation of Accessible Searchable Database. , 2020, The Canadian journal of cardiology.

[3]  P. Glasziou,et al.  How to complete a full systematic review in 2 weeks: processes, facilitators and barriers. , 2020, Journal of clinical epidemiology.

[4]  S. Katz,et al.  Predictors of adherence to positive airway pressure therapy in children: a systematic review and meta-analysis. , 2019, Sleep medicine.

[5]  Tamara Rader,et al.  Searching for and selecting studies , 2019, Cochrane Handbook for Systematic Reviews of Interventions.

[6]  Pearl Brereton,et al.  The use of bibliography enriched features for automatic citation screening , 2019, J. Biomed. Informatics.

[7]  M. Sampson,et al.  Crowdsourcing the Citation Screening Process for Systematic Reviews: Validation Study , 2019, Journal of medical Internet research.

[8]  Jing Liao,et al.  Machine learning algorithms for systematic review: reducing workload in a preclinical review of animal studies and reducing human screening error , 2019, Systematic Reviews.

[9]  Wichor M Bramer,et al.  A systematic approach to searching: an efficient and complete method to develop literature searches , 2018, Journal of the Medical Library Association : JMLA.

[10]  E. Beller,et al.  The Polyglot Search Translator (PST): Evaluation of a tool for improving searching in systematic reviews: A randomised cross-over trial , 2018 .

[11]  Zhiyong Lu,et al.  Best Match: New relevance search for PubMed , 2018, PLoS biology.

[12]  A. Nasr,et al.  Symptom development in originally asymptomatic CPAM diagnosed prenatally: a systematic review , 2018, Pediatric Surgery International.

[13]  Byron C. Wallace,et al.  An exploration of crowdsourcing citation screening for systematic reviews , 2017, Research synthesis methods.

[14]  Christine Urquhart,et al.  Complementary approaches to searching MEDLINE may be sufficient for updating systematic reviews. , 2016, Journal of clinical epidemiology.

[15]  Ihab F. Ilyas,et al.  Learning to identify relevant studies for systematic reviews using random forest and external information , 2016, Machine Learning.

[16]  Sophia Ananiadou,et al.  Reducing systematic review workload through certainty-based screening , 2014, J. Biomed. Informatics.

[17]  Aaron M. Cohen,et al.  A Large-Scale Analysis of the Reasons Given for Excluding Articles that are Retrieved by Literature Search During Systematic Review , 2013, AMIA.

[18]  Christine Urquhart,et al.  Precision of healthcare systematic review searches in a cross‐sectional sample , 2011, Research synthesis methods.

[19]  A. Morrison,et al.  Can electronic search engines optimize screening of search results in systematic reviews: an empirical study , 2006, BMC medical research methodology.

[20]  Cochrane Handbook for Systematic Reviews of Interventions Edited by Julian P. T. Higgins & , 2006 .

[21]  M. Sampson,et al.  Distributed under Creative Commons Cc-by 4.0 a Systematic Review of Pediatric Clinical Trials of High Dose Vitamin D , 2022 .