Detection of Hospital Acquired Infections in sparse and noisy Swedish patient records : A machine learning approach using Naïve Bayes, Support Vector Machines and C4.5

Hospital Acquired Infections (HAI) pose a significant risk on patients’ health while their surveillance is an additional work load for hospital medical staff and hospital management. Our overall aim is to build a system which reliably retrieves all patient records which potentially include HAI, to reduce the burden of manually checking patient records by the hospital staff. In other words, we emphasize recall when detecting HAI (aiming at 100%) with the highest precision possible. The present study is of experimental nature, focusing on the application of Naive Bayes (NB), Support Vector Machines (SVM) and a C4.5 Decision Tree to the problem and the evaluation of the efficiency of this approach. The three classifiers showed an overall similar performance. SVM yielded the best recall value, 89.8%, for records that contain HAI. We present a machine learning approach as an alternative to rule-based systems which are more common in this task. The classifiers were applied on a small and noisy dataset, generating results which pinpoint the potentials of using learning algorithms for detecting HAI. Further research will have to focus on optimizing the performance of the classifiers and to test them on larger datasets.

[1]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[2]  R. Gaynes,et al.  Feeding back surveillance data to prevent hospital-acquired infections. , 2001, Emerging infectious diseases.

[3]  Shourya Roy,et al.  Special issue on noisy text analytics , 2011, International Journal on Document Analysis and Recognition (IJDAR).

[4]  Niklas Lavesson,et al.  Evaluation and Analysis of Supervised Learning Algorithms and Classifiers , 2006 .

[5]  Peter J. Haug,et al.  Research Paper: Automatic Detection of Acute Bacterial Pneumonia from Chest X-ray Reports , 2000, J. Am. Medical Informatics Assoc..

[6]  Mohak Shah,et al.  Evaluating Learning Algorithms: A Classification Perspective , 2011 .

[7]  J Leal,et al.  Validity of electronic surveillance systems: a systematic review. , 2008, The Journal of hospital infection.

[8]  Clement T. Yu,et al.  Stop Word and Related Problems in Web Interface Integration , 2009, Proc. VLDB Endow..

[9]  William Stafford Noble,et al.  Support vector machine , 2013 .

[10]  A Lepape,et al.  Automated detection of nosocomial infections: evaluation of different strategies in an intensive care unit 2000-2006. , 2011, The Journal of hospital infection.

[11]  W Koller,et al.  Fully Automated Surveillance of Healthcare-Associated Infections with MONI-ICU: A Breakthrough in Clinical Infection Surveillance. , 2011, Applied clinical informatics.

[12]  J. Alexander,et al.  Nosocomial infections. , 1973, Current problems in surgery.

[13]  Heljä Lundgrén-Laine,et al.  Characteristics of Finnish and Swedish intensive care nursing narratives: a comparative analysis to support the development of clinical language technologies , 2011, J. Biomed. Semant..

[14]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[15]  J. Monson,et al.  Hospital-acquired infections , 2012 .

[16]  Maria Kvist,et al.  Rule-based Entity Recognition and Coverage of SNOMED CT in Swedish Clinical Text , 2012, LREC.

[17]  Shourya Roy,et al.  Special issue on noisy text analytics , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[18]  Kansheng Shi,et al.  Efficient text classification method based on improved term reduction and term weighting , 2011 .

[19]  Antoine Geissbühler,et al.  Learning from imbalanced data in surveillance of nosocomial infection , 2006, Artif. Intell. Medicine.

[20]  H Humphreys,et al.  Prevalence surveys of healthcare-associated infections: what do they tell us, if anything? , 2006, Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases.

[21]  Niklas Isenius Abbreviation detection in Swedish Medical Records The Development of SCAN , a Swedish Clinical Abbreviation Normalizer , 2012 .

[22]  A. Akobeng Understanding diagnostic tests 1: sensitivity, specificity and predictive values , 2007, Acta paediatrica.

[23]  D. Nathwani,et al.  Clinical and economic burden of Clostridium difficile infection in Europe: a systematic review of healthcare-facility-acquired infection. , 2012, The Journal of hospital infection.

[24]  Antoine Geissbühler,et al.  Using lexical disambiguation and named-entity recognition to improve spelling correction in the electronic patient record , 2003, Artif. Intell. Medicine.

[25]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[26]  L. Venkata Subramaniam Noisy Text Analytics , 2010, NAACL.

[27]  Nur Izura Udzir,et al.  A Study on Feature Selection and Classification Techniques for Automatic Genre Classification of Traditional Malay Music , 2008, ISMIR.

[28]  Arnold Milstein,et al.  Hospital adoption of automated surveillance technology and the implementation of infection prevention and control programs. , 2011, American journal of infection control.

[29]  Antoine Geissbühler,et al.  An Application of One-class Support Vector Machines to Nosocomial Infection Detection , 2004, MedInfo.

[30]  Klaus-Peter Adlassnig,et al.  Fuzzy Set Theory and Fuzzy Logic in Medicine , .

[31]  Stéfan Jacques Darmoni,et al.  Evaluation of natural language processing from emergency department computerized medical records for intra-hospital syndromic surveillance , 2011, BMC Medical Informatics Decis. Mak..

[32]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[33]  Wendy W Chapman,et al.  Classification of emergency department chief complaints into 7 syndromes: a retrospective analysis of 527,228 patients. , 2005, Annals of emergency medicine.

[34]  Klaus-Peter Adlassnig,et al.  Artificial-intelligence-based hospital-acquired infection control. , 2009, Studies in health technology and informatics.

[35]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[36]  Mary F. Wisniewski,et al.  Computer Algorithms To Detect Bloodstream Infections , 2004, Emerging infectious diseases.

[37]  Evelina Lamma,et al.  A System for Monitoring Nosocomial Infections , 2000, ISMDA.

[38]  L. Nicolle,et al.  Prevention of hospital acquired infections: a practical guide. , 2002 .

[39]  Gilles Cohen,et al.  Data Imbalance in Surveillance of Nosocomial Infections , 2003, ISMDA.

[40]  Michael Klompas,et al.  Automated surveillance of health care-associated infections. , 2009, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[41]  Joshua A. Doherty,et al.  Automated Surveillance for Central Line–Associated Bloodstream Infection in Intensive Care Units , 2008, Infection Control & Hospital Epidemiology.

[42]  Martin Johansson,et al.  En jämförelse mellan elektroniska journalsystem för öppenvården , 2011 .

[43]  Abdul Ghaaliq Lalkhen,et al.  Clinical tests: sensitivity and specificity , 2008 .

[44]  Ian Witten,et al.  Data Mining , 2000 .

[45]  Hercules Dalianis,et al.  Automatic training of lemmatization rules that handle morphological changes in pre-, in- and suffixes alike , 2009, ACL.

[46]  D. Cardo,et al.  Estimating Health Care-Associated Infections and Deaths in U.S. Hospitals, 2002 , 2007, Public health reports.

[47]  James R. Curran,et al.  Web Text Corpus for Natural Language Processing , 2006, EACL.

[48]  Eneida A. Mendonça,et al.  Use of computerized surveillance to detect nosocomial pneumonia in neonatal intensive care unit patients. , 2004, American journal of infection control.

[49]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[50]  I. C. Mogotsi,et al.  Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze: Introduction to information retrieval , 2010, Information Retrieval.

[51]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[52]  Juan Manuel Górriz,et al.  Computer aided diagnosis of Alzheimer's disease using component based SVM , 2011, Appl. Soft Comput..

[53]  Mukesh A. Zaveri,et al.  AUTOMATIC TEXT CLASSIFICATION: A TECHNICAL REVIEW , 2011 .

[54]  Stéfan Jacques Darmoni,et al.  Architecture and Systems for Monitoring Hospital Acquired Infections inside Hospital Information Workflows , 2011 .

[55]  L J Carbary,et al.  Hospital-acquired infections. , 1975, Nursing care.

[56]  Eitel J. M. Lauría,et al.  Combining Bayesian Text Classification and Shrinkage to Automate Healthcare Coding: A Data Quality Analysis , 2011, JDIQ.

[57]  Stéfan Jacques Darmoni,et al.  Natural Language Processing to Detect Risk Patterns Related to Hospital Acquired Infections , 2009, BiomedicalIE@RANLP.

[58]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[59]  Pavel Brazdil,et al.  Comparison of SVM and Some Older Classification Algorithms in Text Classification Tasks , 2006, IFIP AI.