Detecting Healthcare-Associated Infections in Electronic Health Records : Evaluation of Machine Learning and Preprocessing Techniques

Healthcare-associated infections (HAI) are in- fections that patients acquire in the course of medical treatment. Being a severe pub- lic health problem, detecting and monitoring HAI in healthcare documentation is an impor- tant topic to address. Research on automated systems has increased over the past years, but performance is yet to be enhanced. The dataset in this study consists of 214 records obtained from a Point-Prevalence Survey. The records are manually classified into HAI and NoHAI records. Nine different preprocess- ing steps are carried out on the data. Two learning algorithms, Random Forest (RF) and Support Vector Machines (SVM), are applied to the data. The aim is to determine which of the two algorithms is more applicable to the task and if preprocessing methods will affect the performance. RF obtains the best performance results, yielding an F1 -score of 85% and AUC of 0.85 when lemmatisation is used as a preprocessing technique. Irrespec- tive of which preprocessing method is used, RF yields higher recall values than SVM, with a statistically significant difference for all but one preprocessing method. Regarding each classifier separately, the choice of preprocess- ing method led to no statistically significant improvement in performance results.

[1]  Clement T. Yu,et al.  Stop Word and Related Problems in Web Interface Integration , 2009, Proc. VLDB Endow..

[2]  Christopher J. Fox,et al.  A stop list for general text , 1989, SIGF.

[3]  J. Alexander,et al.  Nosocomial infections. , 1973, Current problems in surgery.

[4]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[5]  Antoine Geissbühler,et al.  An Application of One-class Support Vector Machines to Nosocomial Infection Detection , 2004, MedInfo.

[6]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[7]  R. Scott Evans,et al.  Computer Surveillance of Hospital-Acquired Infections: A 25 year Update , 2009, AMIA.

[8]  Mari Ostendorf,et al.  Classification by Augmenting the Bag-of-Words Representation with Redundancy-Compensated Bigrams ∗ , 2005 .

[9]  Yanjun Qi Random Forest for Bioinformatics , 2012 .

[10]  Mohak Shah,et al.  Evaluating Learning Algorithms: A Classification Perspective , 2011 .

[11]  Antoine Geissbühler,et al.  Towards an Automated Nosocomial Infection Case Reporting - Framework to Build a Computer-aided Detection of Nosocomial Infection , 2009, HEALTHINF.

[12]  D. Benhaddouche,et al.  Control of Nosocomial Infections by Data Mining - TI Journals , 2012 .

[13]  Maria Skeppstedt,et al.  Negation detection in Swedish clinical text: An adaption of NegEx to Swedish , 2011, J. Biomed. Semant..

[14]  H. Dalianis,et al.  The Stockholm EPR Corpus – Characteristics and Some Initial Findings , 2009 .

[15]  William Stafford Noble,et al.  Support vector machine , 2013 .

[16]  Kansheng Shi,et al.  Efficient text classification method based on improved term reduction and term weighting , 2011 .

[17]  Magnus Sahlgren,et al.  An Introduction to Random Indexing , 2005 .

[18]  L. Nicolle,et al.  Prevention of hospital acquired infections: a practical guide. , 2002 .

[19]  Abdollah Dehzangi,et al.  Using Random Forest for Protein Fold Prediction Problem: An Empirical Study , 2010, J. Inf. Sci. Eng..

[20]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[21]  Mukesh A. Zaveri,et al.  AUTOMATIC TEXT CLASSIFICATION: A TECHNICAL REVIEW , 2011 .

[22]  Robert E. Hoyt,et al.  Health Informatics: Practical Guide for Healthcare and Information Technology Professionals , 2010 .

[23]  W Koller,et al.  Fully Automated Surveillance of Healthcare-Associated Infections with MONI-ICU: A Breakthrough in Clinical Infection Surveillance. , 2011, Applied clinical informatics.

[24]  Fernando De la Torre,et al.  Facing Imbalanced Data--Recommendations for the Use of Performance Metrics , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.