Exploratory Analysis of Methods for Automated Classification of Laboratory Test Orders into Syndromic Groups in Veterinary Medicine

Background Recent focus on earlier detection of pathogen introduction in human and animal populations has led to the development of surveillance systems based on automated monitoring of health data. Real- or near real-time monitoring of pre-diagnostic data requires automated classification of records into syndromes–syndromic surveillance–using algorithms that incorporate medical knowledge in a reliable and efficient way, while remaining comprehensible to end users. Methods This paper describes the application of two of machine learning (Naïve Bayes and Decision Trees) and rule-based methods to extract syndromic information from laboratory test requests submitted to a veterinary diagnostic laboratory. Results High performance (F1-macro = 0.9995) was achieved through the use of a rule-based syndrome classifier, based on rule induction followed by manual modification during the construction phase, which also resulted in clear interpretability of the resulting classification process. An unmodified rule induction algorithm achieved an F1-micro score of 0.979 though this fell to 0.677 when performance for individual classes was averaged in an unweighted manner (F1-macro), due to the fact that the algorithm failed to learn 3 of the 16 classes from the training set. Decision Trees showed equal interpretability to the rule-based approaches, but achieved an F1-micro score of 0.923 (falling to 0.311 when classes are given equal weight). A Naïve Bayes classifier learned all classes and achieved high performance (F1-micro = 0.994 and F1-macro = .955), however the classification process is not transparent to the domain experts. Conclusion The use of a manually customised rule set allowed for the development of a system for classification of laboratory tests into syndromic groups with very high performance, and high interpretability by the domain experts. Further research is required to develop internal validation rules in order to establish automated methods to update model rules without user input.

[1]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[2]  David G. Stork,et al.  Pattern Classification , 1973 .

[3]  D. Aucoin,et al.  Purdue University-Banfield National Companion Animal Surveillance Program for emerging and zoonotic diseases. , 2006, Vector borne and zoonotic diseases.

[4]  James Bonomo,et al.  The Office of Science and Technology Policy Blue Ribbon Panel on the Threat of Biological Terrorism Directed Against Livestock , 2004 .

[5]  K. Doi,et al.  Computer-aided diagnosis and artificial intelligence in clinical imaging. , 2011, Seminars in nuclear medicine.

[6]  Wendy W. Chapman,et al.  Accuracy of three classifiers of acute gastrointestinal syndrome for syndromic surveillance , 2002, AMIA.

[7]  P. Rohatgi Radiological evaluation of interstitial lung disease , 2011, Current opinion in pulmonary medicine.

[8]  István Hegedüs,et al.  Research Paper: Semi-automated Construction of Decision Rules to Predict Morbidities from Clinical Texts , 2009, J. Am. Medical Informatics Assoc..

[9]  William B. Lober,et al.  Infectious Disease Informatics and Biosurveillance , 2012 .

[10]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[11]  Ben Y. Reis,et al.  Syndromic surveillance: the effects of syndrome grouping on model accuracy and outbreak detection. , 2004, Annals of emergency medicine.

[12]  Vili Podgorelec,et al.  Decision trees , 2018, Encyclopedia of Database Systems.

[13]  W. Chapman,et al.  Syndrome and outbreak detection using chief-complaint data--experience of the Real-Time Outbreak and Disease Surveillance project. , 2004, MMWR supplements.

[14]  Debbie A. Travers,et al.  Evaluation of preprocessing techniques for chief complaint classification , 2008, J. Biomed. Informatics.

[15]  William B Lober,et al.  Information system architectures for syndromic surveillance. , 2004, MMWR supplements.

[16]  Marek Wesolowski,et al.  Artificial neural networks: theoretical background and pharmaceutical applications: a review. , 2012, Journal of AOAC International.

[17]  Michael M. Wagner,et al.  Early Outbreak Detection Using an Automated Data Feed of Test Orders from a Veterinary Diagnostic Laboratory , 2007, BioSurveillance.

[18]  Armin Shmilovici,et al.  Support Vector Machines , 2005, Data Mining and Knowledge Discovery Handbook.

[19]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[20]  Dean F. Sittig,et al.  The emerging science of very early detection of disease outbreaks. , 2001, Journal of public health management and practice : JPHMP.

[21]  Domonkos Tikk,et al.  Research Paper: Semantic Classification of Diseases in Discharge Summaries Using a Context-aware Rule-based Classifier , 2009, J. Am. Medical Informatics Assoc..

[22]  D. Buckeridge,et al.  Systematic Review: Surveillance Systems for Early Detection of Bioterrorism-Related Diseases , 2004, Annals of Internal Medicine.

[23]  K. Mandl,et al.  Implementation of laboratory order data in BioSense Early Event Detection and Situation Awareness System. , 2005, MMWR supplements.

[24]  Howard S. Burkom,et al.  Statistical Challenges Facing Early Outbreak Detection in Biosurveillance , 2010, Technometrics.

[25]  Kenneth D Mandl,et al.  Measuring outbreak-detection performance by using controlled feature set simulations. , 2004, MMWR supplements.

[26]  Peter J. Haug,et al.  Classifying free-text triage chief complaints into syndromic categories with natural language processing , 2005, Artif. Intell. Medicine.

[27]  Javier Sanchez,et al.  Veterinary syndromic surveillance: Current initiatives and potential for development. , 2011, Preventive veterinary medicine.

[28]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[29]  Radford G Davis,et al.  The Abcs of Bioterrorism for Veterinarians, Focusing on Category a Agents , 2022 .