Machine learning algorithms for systematic review: reducing workload in a preclinical review of animal studies and reducing human screening error

Background Here we outline a method of applying existing machine learning (ML) approaches to aid citation screening in an on-going broad and shallow systematic review of preclinical animal studies, with the aim of achieving a high performing algorithm comparable to human screening. Methods We applied ML approaches to a broad systematic review of animal models of depression at the citation screening stage. We tested two independently developed ML approaches which used different classification models and feature sets. We recorded the performance of the ML approaches on an unseen validation set of papers using sensitivity, specificity and accuracy. We aimed to achieve 95% sensitivity and to maximise specificity. The classification model providing the most accurate predictions was applied to the remaining unseen records in the dataset and will be used in the next stage of the preclinical biomedical sciences systematic review. We used a cross validation technique to assign ML inclusion likelihood scores to the human screened records, to identify potential errors made during the human screening process (error analysis). Results ML approaches reached 98.7% sensitivity based on learning from a training set of 5749 records, with an inclusion prevalence of 13.2%. The highest level of specificity reached was 86%. Performance was assessed on an independent validation dataset. Human errors in the training and validation sets were successfully identified using assigned the inclusion likelihood from the ML model to highlight discrepancies. Training the ML algorithm on the corrected dataset improved the specificity of the algorithm without compromising sensitivity. Error analysis correction leads to a 3% improvement in sensitivity and specificity, which increases precision and accuracy of the ML algorithm. Conclusions This work has confirmed the performance and application of ML algorithms for screening in systematic reviews of preclinical animal studies. It has highlighted the novel use of ML algorithms to identify human error. This needs to be confirmed in other reviews, , but represents a promising approach to integrating human decisions and automation in systematic review methodology.

[1]  Sophia Ananiadou,et al.  Topic detection using paragraph vectors to support active learning in systematic reviews , 2016, J. Biomed. Informatics.

[2]  Sophia Ananiadou,et al.  Developing a Robust Part-of-Speech Tagger for Biomedical Text , 2005, Panhellenic Conference on Informatics.

[3]  M. Pencina,et al.  Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond , 2008, Statistics in medicine.

[4]  Gillian L. Currie,et al.  Systematic reviews and meta-analysis of preclinical studies: why perform them and how to appraise them critically , 2014, Journal of cerebral blood flow and metabolism : official journal of the International Society of Cerebral Blood Flow and Metabolism.

[5]  Maura R. Grossman,et al.  Engineering Quality and Reliability in Technology-Assisted Review , 2016, SIGIR.

[6]  William R. Hersh,et al.  Reducing workload in systematic review preparation using automated citation classification. , 2006, Journal of the American Medical Informatics Association : JAMIA.

[7]  Carlijn R Hooijmans,et al.  Enhancing search efficiency by means of a search filter for finding all studies on animal experimentation in PubMed , 2010, Laboratory animals.

[8]  Jing Liao,et al.  Automation of citation screening in pre-clinical systematic reviews , 2018, bioRxiv.

[9]  Lutz Bornmann,et al.  Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references , 2014, J. Assoc. Inf. Sci. Technol..

[10]  James Thomas,et al.  EPPI-Reviewer 3.5: software for research synthesis , 2007 .

[11]  Philip S. Yu,et al.  Evidence-based medicine, the essential role of systematic reviews, and the need for automated text mining tools , 2010, IHI.

[12]  Carla E. Brodley,et al.  Deploying an interactive machine learning system in an evidence-based practice center: abstrackr , 2012, IHI '12.

[13]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[14]  Brian E. Howard,et al.  SWIFT-Review: a text-mining workbench for systematic review , 2016, Systematic Reviews.

[15]  Tingting Mu,et al.  A semi-supervised approach using label propagation to support citation screening , 2017, J. Biomed. Informatics.

[16]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[17]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[18]  David Ogilvie,et al.  Pinpointing needles in giant haystacks: use of text mining to reduce impractical screening workload in extremely large scoping reviews , 2014, Research synthesis methods.

[19]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[20]  Kathleen F. Kerr,et al.  Net reclassification indices for evaluating risk prediction instruments: a critical review. , 2014, Epidemiology.

[21]  José Antonio Lozano,et al.  Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[23]  Sophia Ananiadou,et al.  Applications of text mining within systematic reviews , 2011, Research synthesis methods.

[24]  Konstantin Mertsalov,et al.  Document Classification with Support Vector Machines , 2009 .

[25]  Paul Glasziou,et al.  Faster title and abstract screening? Evaluating Abstrackr, a semi-automated online screening program for systematic reviewers , 2015, Systematic Reviews.

[26]  Carla E. Brodley,et al.  Toward modernizing the systematic review pipeline in genetics: efficient updating via data mining , 2012, Genetics in Medicine.

[27]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[28]  Prem Timsina,et al.  A comparative analysis of semi-supervised learning: The case of article selection for medical systematic reviews , 2016, Information Systems Frontiers.

[29]  R. Newcombe Two-sided confidence intervals for the single proportion: comparison of seven methods. , 1998, Statistics in medicine.

[30]  Max Kuhn,et al.  The caret Package , 2007 .

[31]  Michael R. Kosorok,et al.  Detection of gene pathways with predictive power for breast cancer prognosis , 2010, BMC Bioinformatics.

[32]  Sophia Ananiadou,et al.  Supporting systematic reviews using LDA-based document representations , 2015, Systematic Reviews.

[33]  Carla E. Brodley,et al.  Active learning for biomedical citation screening , 2010, KDD.

[34]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[35]  S. Ananiadou,et al.  Using text mining for study identification in systematic reviews: a systematic review of current approaches , 2015, Systematic Reviews.

[36]  M. Macleod,et al.  Understanding in vivo modelling of depression in non-human animals: a systematic review protocol , 2016 .

[37]  Tingting Mu,et al.  Descriptive document clustering via discriminant learning in a co‐embedded space of multilevel similarities , 2016, J. Assoc. Inf. Sci. Technol..

[38]  P. Glasziou,et al.  Systematic review automation technologies , 2014, Systematic Reviews.

[39]  Sophia Ananiadou,et al.  Reducing systematic review workload through certainty-based screening , 2014, J. Biomed. Informatics.

[40]  Aaron M. Cohen,et al.  Studying the potential impact of automated document classification on scheduling a systematic review update , 2012, BMC Medical Informatics and Decision Making.

[41]  Andrew W. Brown,et al.  Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry , 2017, BMJ Open.

[42]  Carla E. Brodley,et al.  Semi-automated screening of biomedical citations for systematic reviews , 2010, BMC Bioinformatics.

[43]  Carlijn R Hooijmans,et al.  Updated version of the Embase search filter for animal studies. , 2014, Laboratory animals.