Measuring the impact of screening automation on meta-analyses of diagnostic test accuracy

BackgroundThe large and increasing number of new studies published each year is making literature identification in systematic reviews ever more time-consuming and costly. Technological assistance has been suggested as an alternative to the conventional, manual study identification to mitigate the cost, but previous literature has mainly evaluated methods in terms of recall (search sensitivity) and workload reduction. There is a need to also evaluate whether screening prioritization methods leads to the same results and conclusions as exhaustive manual screening. In this study, we examined the impact of one screening prioritization method based on active learning on sensitivity and specificity estimates in systematic reviews of diagnostic test accuracy.MethodsWe simulated the screening process in 48 Cochrane reviews of diagnostic test accuracy and re-run 400 meta-analyses based on a least 3 studies. We compared screening prioritization (with technological assistance) and screening in randomized order (standard practice without technology assistance). We examined if the screening could have been stopped before identifying all relevant studies while still producing reliable summary estimates. For all meta-analyses, we also examined the relationship between the number of relevant studies and the reliability of the final estimates.ResultsThe main meta-analysis in each systematic review could have been performed after screening an average of 30% of the candidate articles (range 0.07 to 100%). No systematic review would have required screening more than 2308 studies, whereas manual screening would have required screening up to 43,363 studies. Despite an average 70% recall, the estimation error would have been 1.3% on average, compared to an average 2% estimation error expected when replicating summary estimate calculations.ConclusionScreening prioritization coupled with stopping criteria in diagnostic test accuracy reviews can reliably detect when the screening process has identified a sufficient number of studies to perform the main meta-analysis with an accuracy within pre-specified tolerance limits. However, many of the systematic reviews did not identify a sufficient number of studies that the meta-analyses were accurate within a 2% limit even with exhaustive manual screening, i.e., using current practice.

[1]  Aurélie Névéol,et al.  Data Extraction and Synthesis in Systematic Reviews of Diagnostic Test Accuracy: A Corpus for Automating and Evaluating the Process , 2018, AMIA.

[2]  Zhiyong Lu,et al.  Community challenges in biomedical text mining over 10 years: success, failure and the future , 2016, Briefings Bioinform..

[3]  Paula R. Williamson,et al.  Choosing Important Health Outcomes for Comparative Effectiveness Research: A Systematic Review , 2014, PloS one.

[4]  Byron C. Wallace,et al.  Rapid reviews may produce different results to systematic reviews: a meta-epidemiological study , 2019, Journal of clinical epidemiology.

[5]  George Davey Smith,et al.  meta-analysis bias in location and selection of studies , 1998 .

[6]  P. Bossuyt,et al.  Use of methodological search filters to identify diagnostic accuracy studies can lead to the omission of relevant studies. , 2006, Journal of clinical epidemiology.

[7]  T. Trikalinos,et al.  Using data sources beyond PubMed has a modest impact on the results of systematic reviews of therapeutic interventions. , 2015, Journal of clinical epidemiology.

[8]  Raphael Porcher,et al.  Predicting data saturation in qualitative surveys with mathematical models from ecological research. , 2017, Journal of clinical epidemiology.

[9]  Yindalon Aphinyanagphongs,et al.  Research Paper: Text Categorization Models for High-Quality Article Retrieval in Internal Medicine , 2004, J. Am. Medical Informatics Assoc..

[10]  J. Jakobsen,et al.  Trial Sequential Analysis in systematic reviews with meta-analysis , 2017, BMC Medical Research Methodology.

[11]  Maura R. Grossman,et al.  Technology-Assisted Review in Empirical Medicine: Waterloo Participation in CLEF eHealth 2017 , 2017, CLEF.

[12]  J. Sterne,et al.  How important are comprehensive literature searches and the assessment of trial quality in systematic reviews? Empirical study. , 2003, Health technology assessment.

[13]  M. Egger,et al.  Bias in location and selection of studies. , 1998, BMJ.

[14]  Ahmed K. Elmagarmid,et al.  Learning to identify relevant studies for systematic reviews using random forest and external information , 2015, Machine Learning.

[15]  Gerald Gartlehner,et al.  Abbreviated literature searches were viable alternatives to comprehensive searches: a meta-epidemiological study. , 2018, Journal of clinical epidemiology.

[16]  Maura R. Grossman,et al.  Autonomy and Reliability of Continuous Active Learning for Technology-Assisted Review , 2015, ArXiv.

[17]  D. Moher,et al.  A scoping review of rapid review methods , 2015, BMC Medicine.

[18]  Philippe Ravaud,et al.  Automatic screening using word embeddings achieved high sensitivity and workload reduction for updating living network meta-analyses. , 2019, Journal of clinical epidemiology.

[19]  Henry Petersen,et al.  Increased Workload for Systematic Review Literature Searches of Diagnostic Tests Compared With Treatments: Challenges and Opportunities , 2014, JMIR medical informatics.

[20]  Aaron M. Cohen,et al.  Studying the potential impact of automated document classification on scheduling a systematic review update , 2012, BMC Medical Informatics and Decision Making.

[21]  Brian E. Howard,et al.  SWIFT-Review: a text-mining workbench for systematic review , 2016, Systematic Reviews.

[22]  Patrick Bossuyt,et al.  Systematic Reviews of Diagnostic Test Accuracy , 2008, Annals of Internal Medicine.

[23]  F. Chiappelli,et al.  From Systematic Reviews to Clinical Recommendations for Evidence-Based Health Care: Validation of Revised Assessment of Multiple Systematic Reviews (R-AMSTAR) for Grading of Clinical Relevance , 2010, The open dentistry journal.

[24]  Leif Azzopardi,et al.  CLEF 2018 Technologically Assisted Reviews in Empirical Medicine Overview , 2018, CLEF.

[25]  Jeremy Grimshaw,et al.  AMSTAR is a reliable and valid measurement tool to assess the methodological quality of systematic reviews. , 2009, Journal of clinical epidemiology.

[26]  Pearl Brereton,et al.  A critical analysis of studies that address the use of text mining for citation screening in systematic reviews , 2016, EASE.

[27]  S. Ananiadou,et al.  Using text mining for study identification in systematic reviews: a systematic review of current approaches , 2015, Systematic Reviews.

[28]  Anne-Lise Veuthey,et al.  Assisting medical annotation in Swiss-Prot using statistical classifiers , 2005, Int. J. Medical Informatics.

[29]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[30]  Prakash M. Nadkarni,et al.  Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions , 2011, J. Am. Medical Informatics Assoc..

[31]  Heinz Holling,et al.  Meta-Analysis of Diagnostic Accuracy with mada , 2015 .

[32]  Sophia Ananiadou,et al.  Prioritising references for systematic reviews with RobotAnalyst: A user study , 2018, Research synthesis methods.

[33]  William R. Hersh,et al.  Reducing workload in systematic review preparation using automated citation classification. , 2006, Journal of the American Medical Informatics Association : JAMIA.

[34]  Jing Liao,et al.  Machine learning algorithms for systematic review: reducing workload in a preclinical review of animal studies and reducing human screening error , 2019, Systematic Reviews.

[35]  Aurélie Névéol,et al.  LIMSI@CLEF eHealth 2018 Task 2: Technology Assisted Reviews by Stacking Active and Static Learning , 2018, CLEF.

[36]  Megan Nuspl,et al.  Grey literature in systematic reviews: a cross-sectional study of the contribution of non-English reports, unpublished studies and dissertations to the results of meta-analyses in child-relevant reviews , 2017, BMC Medical Research Methodology.

[37]  Andrew Booth,et al.  How much searching is enough? Comprehensive versus optimal retrieval for technology assessments , 2010, International Journal of Technology Assessment in Health Care.

[38]  David Moher,et al.  Should meta-analysts search Embase in addition to Medline? , 2003, Journal of clinical epidemiology.

[39]  A. Booth Over 85% of included studies in systematic reviews are on MEDLINE. , 2016, Journal of clinical epidemiology.

[40]  C M Rutter,et al.  A hierarchical regression approach to meta‐analysis of diagnostic test accuracy evaluations , 2001, Statistics in medicine.

[41]  Kristian Thorlund,et al.  Evolution of Heterogeneity (I2) Estimates and Their 95% Confidence Intervals in Large Meta-Analyses , 2012, PloS one.

[42]  Carla E. Brodley,et al.  Deploying an interactive machine learning system in an evidence-based practice center: abstrackr , 2012, IHI '12.

[43]  Jm Thomas,et al.  Diffusion of innovation in systematic review methodology: why is study selection not yet assisted by automation? , 2013 .

[44]  Tari Turner,et al.  Living Systematic Reviews: An Emerging Opportunity to Narrow the Evidence-Practice Gap , 2014, PLoS medicine.

[45]  J. Glanville,et al.  Search strategies to identify diagnostic accuracy studies in MEDLINE and EMBASE. , 2013, The Cochrane database of systematic reviews.

[46]  A R Jadad,et al.  Methodology and reports of systematic reviews and meta-analyses: a comparison of Cochrane reviews with articles published in paper-based journals. , 1998, JAMA.

[47]  Johannes B Reitsma,et al.  Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. , 2005, Journal of clinical epidemiology.

[48]  Annette M. Molinaro,et al.  Prediction error estimation: a comparison of resampling methods , 2005, Bioinform..

[49]  Maura R. Grossman,et al.  Engineering Quality and Reliability in Technology-Assisted Review , 2016, SIGIR.