Text mining approach to predict hospital admissions using early medical records from the emergency department

OBJECTIVE Emergency department (ED) overcrowding is a serious issue for hospitals. Early information on short-term inward bed demand from patients receiving care at the ED may reduce the overcrowding problem, and optimize the use of hospital resources. In this study, we use text mining methods to process data from early ED patient records using the SOAP framework, and predict future hospitalizations and discharges. DESIGN We try different approaches for pre-processing of text records and to predict hospitalization. Sets-of-words are obtained via binary representation, term frequency, and term frequency-inverse document frequency. Unigrams, bigrams and trigrams are tested for feature formation. Feature selection is based on χ2 and F-score metrics. In the prediction module, eight text mining methods are tested: Decision Tree, Random Forest, Extremely Randomized Tree, AdaBoost, Logistic Regression, Multinomial Naïve Bayes, Support Vector Machine (Kernel linear) and Nu-Support Vector Machine (Kernel linear). MEASUREMENTS Prediction performance is evaluated by F1-scores. Precision and Recall values are also informed for all text mining methods tested. RESULTS Nu-Support Vector Machine was the text mining method with the best overall performance. Its average F1-score in predicting hospitalization was 77.70%, with a standard deviation (SD) of 0.66%. CONCLUSIONS The method could be used to manage daily routines in EDs such as capacity planning and resource allocation. Text mining could provide valuable information and facilitate decision-making by inward bed management teams.

[1]  Xiao Hu,et al.  Intracranial hypertension prediction using extremely randomized decision trees. , 2012, Medical engineering & physics.

[2]  Shyam Visweswaran,et al.  Building an automated SOAP classifier for emergency department reports , 2012, J. Biomed. Informatics.

[3]  Domonkos Tikk,et al.  Research Paper: Semantic Classification of Diseases in Discharge Summaries Using a Context-aware Rule-based Classifier , 2009, J. Am. Medical Informatics Assoc..

[4]  HassanienAboul Ella,et al.  Biometric cattle identification approach based on Weber's Local Descriptor and AdaBoost classifier , 2016 .

[5]  I. Kronborg,et al.  Integrated care facilitation for older patients with complex health care needs reduces hospital demand. , 2007, Australian health review : a publication of the Australian Hospital Association.

[6]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[7]  N. Ramaraj,et al.  A hybrid prediction model with F-score feature selection for type II Diabetes databases , 2010, A2CWiC '10.

[8]  P. Meesad,et al.  A Chi-Square-Test for Word Importance Differentiation in Text Classification , 2022 .

[9]  M. Aponte,et al.  Workforce projections for emergency medicine in Puerto Rico: a five-year follow-up of an evident demand. , 2005, The Journal of emergency medicine.

[10]  István Hargittai,et al.  JEROME I. FRIEDMAN , 2004 .

[11]  Goran Nenadic,et al.  Mining characteristics of epidemiological studies from Medline: a case study in obesity , 2014, J. Biomed. Semant..

[12]  Shumin Fei,et al.  Probability estimation for multi-class classification using AdaBoost , 2014, Pattern Recognit..

[13]  Yuval Shahar,et al.  Classification of patients by severity grades during triage in the emergency department using data mining methods. , 2012, Journal of evaluation in clinical practice.

[14]  Kalpana Raja,et al.  Classification of clinically useful sentences in clinical evidence resources , 2016, J. Biomed. Informatics.

[15]  Karl Atkin,et al.  Representation of South Asian people in randomised clinical trials: analysis of trials' data , 2003, BMJ : British Medical Journal.

[16]  Goran Nenadic,et al.  A text mining approach to the prediction of disease status from clinical discharge summaries. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[17]  Barbara Kozier,et al.  Techniques in Clinical Nursing , 1989 .

[18]  Bin Zheng,et al.  Research Paper: Enhancing Text Categorization with Semantic-enriched Representation and Training Data Augmentation , 2006, J. Am. Medical Informatics Assoc..

[19]  Xiong Chen,et al.  Facial expression recognition from image sequences using twofold random forest classifier , 2015, Neurocomputing.

[20]  M. Morreale,et al.  The OTA's Guide to Writing Soap Notes , 2002 .

[21]  Shirley Eichenwald Maki,et al.  Using the Electronic Health Record in the Health Care Provider Practice , 2007 .

[22]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[23]  Dan Klein,et al.  Improved Identification of Noun Phrases in Clinical Radiology Reports Using a High-Performance Statistical Natural Language Parser Augmented with the UMLS Specialist Lexicon , 2005 .

[24]  Banshidhar Majhi,et al.  Brain MR image classification using two-dimensional discrete wavelet transform and AdaBoost with random forests , 2016, Neurocomputing.

[25]  Chih-Jen Lin,et al.  Combining SVMs with Various Feature Selection Strategies , 2006, Feature Extraction.

[26]  Serkan Günal,et al.  A novel probabilistic feature selection method for text classification , 2012, Knowl. Based Syst..

[27]  Ranjana Sodhi,et al.  A rule-based S-Transform and AdaBoost based approach for power quality assessment , 2016 .

[28]  Rongwei Fu,et al.  Predictors of patient length of stay in 9 emergency departments. , 2012, The American journal of emergency medicine.

[29]  Goran Nenadic,et al.  Text mining of cancer-related information: Review of current status and future directions , 2014, Int. J. Medical Informatics.

[30]  Bruce Bartholow Duncan,et al.  Chronic non-communicable diseases in Brazil: burden and current challenges , 2011, The Lancet.

[31]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[32]  Justin St-Maurice,et al.  Analyzing primary care data to characterize inappropriate emergency room use. , 2012, Studies in health technology and informatics.

[33]  Aaron M. Cohen,et al.  Research Paper: A System for Classifying Disease Comorbidity Status from Medical Discharge Summaries Using Automated Hotspot and Negated Concept Detection , 2009, J. Am. Medical Informatics Assoc..

[34]  Kirsten Vallmuur,et al.  Machine learning approaches to analysing textual injury surveillance data: a systematic review. , 2015, Accident; analysis and prevention.

[35]  Damian Smedley,et al.  The influence of disease categories on gene candidate predictions from model organism phenotypes , 2014, Journal of Biomedical Semantics.

[36]  Rui Rijo,et al.  ICD9-based Text Mining Approach to Children Epilepsy Classification , 2013 .

[37]  William C. Paganelli,et al.  Assessing surgical site infection risk factors using electronic medical records and text mining. , 2014, American journal of infection control.

[38]  Michel Dumontier,et al.  Toward a complete dataset of drug-drug interaction information from publicly available sources , 2015, J. Biomed. Informatics.

[39]  Usman Qamar,et al.  HMV: A medical decision support framework using multi-layer classifiers for disease prediction , 2016, J. Comput. Sci..

[40]  Ronen Feldman,et al.  Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.

[41]  Aaron M. Cohen,et al.  Research Paper: Cross-Topic Learning for Work Prioritization in Systematic Review Creation and Update , 2009, J. Am. Medical Informatics Assoc..

[42]  Gavin Hackeling,et al.  Mastering Machine Learning With scikit-learn , 2014 .

[43]  Dursun Delen,et al.  An analytic approach to better understanding and management of coronary surgeries , 2012, Decis. Support Syst..

[44]  Keke Chen,et al.  Model Formulation: A Document Clustering and Ranking System for Exploring MEDLINE Citations , 2007, J. Am. Medical Informatics Assoc..

[45]  Xiaoyan Wang,et al.  Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: a feasibility study. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[46]  Amir-Masoud Eftekhari-Moghadam,et al.  Knowledge discovery in medicine: Current issue and future trend , 2014, Expert Syst. Appl..

[47]  Marie Schmidt,et al.  HEAL TH AT A GLANCE , 2007 .

[48]  Mark Mackay,et al.  Choice of Models for the Analysis and Forecasting of Hospital Beds , 2005, Health care management science.

[49]  K. Baumlin,et al.  Validating Emergency Department Vital Signs Using a Data Quality Engine for Data Warehouse , 2013, The open medical informatics journal.

[50]  Nils Olsen,et al.  Emergency department patient flow: the influence of hospital census variables on emergency department length of stay. , 2009, Academic emergency medicine : official journal of the Society for Academic Emergency Medicine.

[51]  Yugyung Lee,et al.  Model Formulation: MachineProse: An Ontological Framework for Scientific Assertions , 2006, J. Am. Medical Informatics Assoc..

[52]  Alexander A. Morgan,et al.  Research Paper: Rapidly Retargetable Approaches to De-identification in Medical Records , 2007, J. Am. Medical Informatics Assoc..

[53]  Ming Yang,et al.  Filtering big data from social media - Building an early warning system for adverse drug reactions , 2015, J. Biomed. Informatics.

[54]  Alfred Cuschieri,et al.  Instrument for objective assessment of appropriateness of surgical bed occupancy: validation study , 2003, BMJ : British Medical Journal.

[55]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[56]  R. Farmer,et al.  Models for forecasting hospital bed requirements in the acute sector. , 1990, Journal of epidemiology and community health.

[57]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[58]  William R. Hersh,et al.  Reducing workload in systematic review preparation using automated citation classification. , 2006, Journal of the American Medical Informatics Association : JAMIA.

[59]  Susannah Cameron,et al.  Learning to Write Case Notes Using the SOAP Format , 2002 .

[60]  G. Jelinek,et al.  The association between hospital overcrowding and mortality among patients admitted via Western Australian emergency departments , 2006, The Medical journal of Australia.

[61]  J. Ashley,et al.  Forecasting hospital bed needs. , 1981, British medical journal.

[62]  Václav Snásel,et al.  Biometric cattle identification approach based on Weber's Local Descriptor and AdaBoost classifier , 2016, Comput. Electron. Agric..

[63]  George Hripcsak,et al.  Automated acquisition of disease drug knowledge from biomedical and clinical documents: an initial study. , 2008, Journal of the American Medical Informatics Association : JAMIA.

[64]  D. Aronsky,et al.  Systematic review of emergency department crowding: causes, effects, and solutions. , 2008, Annals of emergency medicine.