Detecting hospital-acquired infections: A document classification approach using support vector machines and gradient tree boosting

Hospital-acquired infections pose a significant risk to patient health, while their surveillance is an additional workload for hospital staff. Our overall aim is to build a surveillance system that reliably detects all patient records that potentially include hospital-acquired infections. This is to reduce the burden of having the hospital staff manually check patient records. This study focuses on the application of text classification using support vector machines and gradient tree boosting to the problem. Support vector machines and gradient tree boosting have never been applied to the problem of detecting hospital-acquired infections in Swedish patient records, and according to our experiments, they lead to encouraging results. The best result is yielded by gradient tree boosting, at 93.7 percent recall, 79.7 percent precision and 85.7 percent F1 score when using stemming. We can show that simple preprocessing techniques and parameter tuning can lead to high recall (which we aim for in screening patient records) with appropriate precision for this task.

[1]  L J Carbary,et al.  Hospital-acquired infections. , 1975, Nursing care.

[2]  Hercules Dalianis,et al.  Detection of Hospital Acquired Infections in sparse and noisy Swedish patient records : A machine learning approach using Naïve Bayes, Support Vector Machines and C4.5 , 2012 .

[3]  A Lepape,et al.  Automated detection of nosocomial infections: evaluation of different strategies in an intensive care unit 2000-2006. , 2011, The Journal of hospital infection.

[4]  Antoine Geissbühler,et al.  An Application of One-class Support Vector Machines to Nosocomial Infection Detection , 2004, MedInfo.

[5]  N. Lavrac,et al.  Intelligent Data Analysis in Medicine and Pharmacology , 1997 .

[6]  Clement T. Yu,et al.  Stop Word and Related Problems in Web Interface Integration , 2009, Proc. VLDB Endow..

[7]  Subana Shanmuganathan,et al.  Text classification for medical informatics: a comparison of models for data mining radiological medical records , 2011 .

[8]  Antoine Geissbühler,et al.  Learning from imbalanced data in surveillance of nosocomial infection , 2006, Artif. Intell. Medicine.

[9]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[10]  J. Monson,et al.  Hospital-acquired infections , 2012 .

[11]  Pavel Brazdil,et al.  Comparison of SVM and Some Older Classification Algorithms in Text Classification Tasks , 2006, IFIP AI.

[12]  Antoine Geissbühler,et al.  Towards an Automated Nosocomial Infection Case Reporting - Framework to Build a Computer-aided Detection of Nosocomial Infection , 2009, HEALTHINF.

[13]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[14]  W Koller,et al.  Fully Automated Surveillance of Healthcare-Associated Infections with MONI-ICU: A Breakthrough in Clinical Infection Surveillance. , 2011, Applied clinical informatics.

[15]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[16]  Mukesh A. Zaveri,et al.  AUTOMATIC TEXT CLASSIFICATION: A TECHNICAL REVIEW , 2011 .

[17]  Chien-Yeh Hsu,et al.  Predicting Hospital-Acquired Infections by Scoring System with Simple Parameters , 2011, PloS one.

[18]  Nur Izura Udzir,et al.  A Study on Feature Selection and Classification Techniques for Automatic Genre Classification of Traditional Malay Music , 2008, ISMIR.

[19]  D. Benhaddouche,et al.  Control of Nosocomial Infections by Data Mining - TI Journals , 2012 .

[20]  Mohak Shah,et al.  Evaluating Learning Algorithms: A Classification Perspective , 2011 .

[21]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[22]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[23]  Gilles Cohen,et al.  Data Imbalance in Surveillance of Nosocomial Infections , 2003, ISMDA.

[24]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[25]  L. Nicolle,et al.  Prevention of hospital acquired infections: a practical guide. , 2002 .

[26]  William Stafford Noble,et al.  Support vector machine , 2013 .

[27]  Michael Klompas,et al.  Automated surveillance of health care-associated infections. , 2009, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.