Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow

Background Current text mining tools supporting abstract screening in systematic reviews are not widely used, in part because they lack sensitivity and precision. We set out to develop an accessible, semi-automated “workflow” to conduct abstract screening for systematic reviews and other knowledge synthesis methods. Methods We adopt widely recommended text-mining and machine-learning methods to (1) process title-abstracts into numerical training data; and (2) train a classification model to predict eligible abstracts. The predicted abstracts are screened by human reviewers for (“true”) eligibility, and the newly eligible abstracts are used to identify similar abstracts, using near-neighbor methods, which are also screened. These abstracts, as well as their eligibility results, are used to update the classification model, and the above steps are iterated until no new eligible abstracts are identified. The workflow was implemented in R and evaluated using a systematic review of insulin formulations for type-1 diabetes (14,314 abstracts) and a scoping review of knowledge-synthesis methods (17,200 abstracts). Workflow performance was evaluated against the recommended practice of screening abstracts by 2 reviewers, independently. Standard measures were examined: sensitivity (inclusion of all truly eligible abstracts), specificity (exclusion of all truly ineligible abstracts), precision (inclusion of all truly eligible abstracts among all abstracts screened as eligible), F1-score (harmonic average of sensitivity and precision), and accuracy (correctly predicted eligible or ineligible abstracts). Workload reduction was measured as the hours the workflow saved, given only a subset of abstracts needed human screening. Results With respect to the systematic and scoping reviews respectively, the workflow attained 88%/89% sensitivity, 99%/99% specificity, 71%/72% precision, an F1-score of 79%/79%, 98%/97% accuracy, 63%/55% workload reduction, with 12%/11% fewer abstracts for full-text retrieval and screening, and 0%/1.5% missed studies in the completed reviews. Conclusion The workflow was a sensitive, precise, and efficient alternative to the recommended practice of screening abstracts with 2 reviewers. All eligible studies were identified in the first case, while 6 studies (1.5%) were missed in the second that would likely not impact the review’s conclusions. We have described the workflow in language accessible to reviewers with limited exposure to natural language processing and machine learning, and have made the code available to reviewers.

[1]  Daniela M. Witten,et al.  An Introduction to Statistical Learning: with Applications in R , 2013 .

[2]  Andrea C Tricco,et al.  Few studies exist examining methods for selecting studies, abstracting data, and appraising quality in a systematic review. , 2019, Journal of clinical epidemiology.

[3]  Tianxi Cai,et al.  Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data , 2018, PSB.

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Lisa Hartling,et al.  Technology-assisted title and abstract screening for systematic reviews: a retrospective evaluation of the Abstrackr machine learning tool , 2018, Systematic Reviews.

[6]  W. Ungar,et al.  An assessment of inter-rater agreement of the literature filtering process in the development of evidence-based dietary guidelines , 2006, Public Health Nutrition.

[7]  Andrew W. Brown,et al.  Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry , 2017, BMJ Open.

[8]  David Moher,et al.  Epidemiology and Reporting Characteristics of Systematic Reviews of Biomedical Research: A Cross-Sectional Study , 2016, PLoS medicine.

[9]  J. Higgins Cochrane handbook for systematic reviews of interventions. Version 5.1.0 [updated March 2011]. The Cochrane Collaboration , 2011 .

[10]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[11]  Hossam M. Hammady,et al.  Rayyan—a web and mobile app for systematic reviews , 2016, Systematic Reviews.

[12]  Jing Liao,et al.  Machine learning algorithms for systematic review: reducing workload in a preclinical review of animal studies and reducing human screening error , 2019, Systematic Reviews.

[13]  Jelena Jovanovic,et al.  RysannMD: A biomedical semantic annotator balancing speed and accuracy , 2017, J. Biomed. Informatics.

[14]  Monika Kastner,et al.  A scoping review identifies multiple emerging knowledge synthesis methods, but few studies operationalize the method. , 2016, Journal of clinical epidemiology.

[15]  Andrea C. Tricco,et al.  Improving quality and efficiency in selecting, abstracting and appraising studies for rapid reviews , 2017 .

[16]  Peter W. Foltz,et al.  The Measurement of Textual Coherence with Latent Semantic Analysis. , 1998 .

[17]  Publisher's Note , 2018, Anaesthesia.

[18]  David Moher,et al.  All in the Family: systematic reviews, rapid reviews, scoping reviews, realist reviews, and more , 2015, Systematic Reviews.

[19]  Guy Tsafnat,et al.  Moving toward the automation of the systematic review process: a summary of discussions at the second meeting of International Collaboration for the Automation of Systematic Reviews (ICASR) , 2018, Systematic Reviews.

[20]  Ebrahim Bagheri,et al.  Improving the conduct of systematic reviews: a process mining perspective. , 2018, Journal of clinical epidemiology.

[21]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[22]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[23]  M. Petticrew,et al.  Systematic Reviews in the Social Sciences: A Practical Guide , 2005 .

[24]  Stan Matwin,et al.  A new algorithm for reducing the workload of experts in performing systematic reviews , 2010, J. Am. Medical Informatics Assoc..

[25]  Jennifer L. Stevenson,et al.  PROTOCOL: Participation, inclusion, transparency and accountability (PITA) to improve public services in low‐ and middle‐income countries: a systematic review , 2018, Campbell systematic reviews.

[26]  Anna-Lan Huang,et al.  Similarity Measures for Text Document Clustering , 2008 .

[27]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[28]  Laura A. Levit,et al.  Finding what works in health care : standards for systematic reviews , 2011 .

[29]  James Thomas,et al.  Use of cost-effectiveness analysis to compare the efficiency of study identification methods in systematic reviews , 2016, Systematic Reviews.

[30]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[31]  J. Higgins,et al.  Cochrane Handbook for Systematic Reviews of Interventions, Version 5.1.0. The Cochrane Collaboration , 2013 .

[32]  Phil Edwards,et al.  Identification of randomized controlled trials in systematic reviews: accuracy and reliability of screening records , 2002, Statistics in medicine.

[33]  M. Petticrew,et al.  Systematic Reviews in the Social Sciences , 2006 .

[34]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[35]  William R. Hersh,et al.  A survey of current work in biomedical text mining , 2005, Briefings Bioinform..

[36]  Kevin W. Boyack,et al.  Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches , 2011, PloS one.

[37]  Ahmed K. Elmagarmid,et al.  Learning to identify relevant studies for systematic reviews using random forest and external information , 2015, Machine Learning.

[38]  Catherine H. Yu,et al.  Comparative Efficacy and Safety of Ultra-Long-Acting, Long-Acting, Intermediate-Acting, and Biosimilar Insulins for Type 1 Diabetes Mellitus: a Systematic Review and Network Meta-Analysis , 2021, Journal of General Internal Medicine.

[39]  I. Olkin,et al.  Estimating time to conduct a meta-analysis from number of citations retrieved. , 1999, JAMA.

[40]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[41]  Jennifer D'Souza,et al.  Knowledge synthesis methods for generating or refining theory: a scoping review reveals that little guidance is available. , 2016, Journal of clinical epidemiology.

[42]  James Thomas,et al.  EPPI-Reviewer 3.5: software for research synthesis , 2007 .

[43]  Mark Johnson,et al.  More Efficient Topic Modelling Through a Noun Only Approach , 2015, ALTA.

[44]  Russell Gruen,et al.  Title and Abstract Screening and Evaluation in Systematic Reviews (TASER): a pilot randomised controlled trial of title and abstract screening by medical students , 2014, Systematic Reviews.

[45]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[46]  Xiangliang Zhang,et al.  An up-to-date comparison of state-of-the-art classification algorithms , 2017, Expert Syst. Appl..

[47]  S. Ananiadou,et al.  Using text mining for study identification in systematic reviews: a systematic review of current approaches , 2015, Systematic Reviews.

[48]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.