论文信息 - Predicting Publication Inclusion for Diagnostic Accuracy Test Reviews Using Random Forests and Topic Modelling

Predicting Publication Inclusion for Diagnostic Accuracy Test Reviews Using Random Forests and Topic Modelling

Finding all relevant publications to perform a systematic review can be a time consuming task, especially in the field of diagnostic test accuracy. Therefore, the CLEF eHealth lab ‘technologically assisted reviews in empirical medicine’ was established to create a basis of comparison between various methods. In this paper we describe a method submitted to the lab. This method consists of a topic model used to extract features and a random forest to classify the relevant papers. Classifier performance shows and average decrease of 33.3% in workload (i.e., documents to read) when aiming for a 95% recall and 24.9% for 100% recall. However, there is a large variety in workload reduction (79.3% to 0.9%) between the diagnostic test accuracy reviews.

Sílvia Delgado Olabarriaga | Allard J. van Altena | S. Olabarriaga | A. V. Altena

[1] Francis R. Bach,et al. Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[2] Gerard Salton,et al. The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[3] Yiming Yang,et al. RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[4] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5] Jeffrey Heer,et al. Topic Model Diagnostics: Assessing Domain Relevance via Topical Alignment , 2013, ICML.

[6] Peter Dalgaard,et al. R Development Core Team (2010): R: A language and environment for statistical computing , 2010 .

[7] Maura R. Grossman,et al. Engineering Quality and Reliability in Technology-Assisted Review , 2016, SIGIR.

[8] David B. Dunson,et al. Probabilistic topic models , 2012, Commun. ACM.

[9] Sophia Ananiadou,et al. Supporting systematic reviews using LDA-based document representations , 2015, Systematic Reviews.

[10] Dina Demner-Fushman,et al. Feature Engineering and a Proposed Decision-Support System for Systematic Reviewers of Medical Evidence , 2014, PloS one.

[11] Kurt Hornik,et al. topicmodels : An R Package for Fitting Topic Models , 2016 .

[12] S. Ananiadou,et al. Using text mining for study identification in systematic reviews: a systematic review of current approaches , 2015, Systematic Reviews.

[13] R. Dessau,et al. The diagnostic accuracy of serological tests for Lyme borreliosis in Europe: a systematic review and meta-analysis , 2016, BMC Infectious Diseases.

[14] Kurt Hornik,et al. Text Mining Infrastructure in R , 2008 .

[15] Leif Azzopardi,et al. CLEF 2018 Technologically Assisted Reviews in Empirical Medicine Overview , 2018, CLEF.

[16] Aeilko H. Zwinderman,et al. Understanding big data themes from scientific biomedical literature through topic modeling , 2016, Journal of Big Data.

[17] Max Kuhn,et al. caret: Classification and Regression Training , 2015 .