Ensemble model for pre-discharge icd10 coding prediction

The translation of medical diagnosis to clinical coding has wide range of applications in billing, aetiology analysis, and auditing. Currently, coding is a manual effort while the automation of such task is not straight forward. Among the challenges are the messy and noisy clinical records, case complexities, along with the huge ICD10 code space. Previous work mainly relied on discharge notes for prediction and was applied to a very limited data scale. We propose an ensemble model incorporating multiple clinical data sources for accurate code predictions. We further propose an assessment mechanism to provide confidence rates in predicted outcomes. Extensive experiments were performed on two new real-world clinical datasets (inpatient & outpatient) with unaltered casemix distributions from Maharaj Nakorn Chiang Mai Hospital. We obtain multi-label classification accuracies of 0.73 and 0.58 for average precision, 0.56 and 0.35 for F1-scores and 0.71 and 0.4 accuracy in predicting principal diagnosis for inpatient and outpatient datasets respectively.

[1]  Gang Zhang,et al.  An Ensemble Learning Based Framework for Traditional Chinese Medicine Data Analysis with ICD-10 Labels , 2015, TheScientificWorldJournal.

[2]  Kerin Robinson,et al.  The Risk and Consequences of Clinical Miscoding Due to Inadequate Medical Documentation: A Case Study of the Impact on Health Services Funding , 2009, Health information management : journal of the Health Information Management Association of Australia.

[3]  Sandra R. Fuller Importance of ICD-10 , 2009 .

[4]  Koldo Gojenola,et al.  Interpretable deep learning to map diagnostic texts to ICD-10 codes , 2019, Int. J. Medical Informatics.

[5]  Shyamala G. Nadathur,et al.  Maximising the value of hospital administrative datasets. , 2010, Australian health review : a publication of the Australian Hospital Association.

[6]  Shie Mannor,et al.  The cross entropy method for classification , 2005, ICML.

[7]  Daniel L. Rubin,et al.  Radiology report annotation using intelligent word embeddings: Applied to multi-institutional chest CT cohort , 2018, J. Biomed. Informatics.

[8]  Xavier Serra,et al.  A Deep Multimodal Approach for Cold-start Music Recommendation , 2017, DLRS@RecSys.

[9]  Zhi-Hua Zhou,et al.  A Unified View of Multi-Label Performance Measures , 2016, ICML.

[10]  Fei Li,et al.  ICD Coding from Clinical Text Using Multi-Filter Residual Convolutional Neural Network , 2019, AAAI.

[11]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[12]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[13]  Yifan Peng,et al.  BioSentVec: creating sentence embeddings for biomedical texts , 2018, 2019 IEEE International Conference on Healthcare Informatics (ICHI).

[14]  Walter Daelemans,et al.  Assigning clinical codes with data-driven concept representation on Dutch clinical free text , 2017, J. Biomed. Informatics.

[15]  Sue E Bowman,et al.  Measuring and Benchmarking Coding Productivity: A Decade of AHIMA Leadership , 2019 .

[16]  C. Langlotz,et al.  Deep Learning to Classify Radiology Free-Text Reports. , 2017, Radiology.

[17]  Pengtao Xie,et al.  On the Automatic Generation of Medical Imaging Reports , 2017, ACL.

[18]  Ramakanth Kavuluru,et al.  Neural transfer learning for assigning diagnosis codes to EMRs , 2019, Artif. Intell. Medicine.

[19]  W. Bruce Croft,et al.  Combining classifiers in text categorization , 1996, SIGIR '96.

[20]  Yuan Lu,et al.  An empirical evaluation of supervised learning approaches in assigning diagnosis codes to electronic medical records , 2015, Artif. Intell. Medicine.

[21]  Anthony N. Nguyen,et al.  Automatic ICD-10 classification of cancers from free-text death certificates , 2015, Int. J. Medical Informatics.

[22]  Igor Kononenko,et al.  Machine learning for medical diagnosis: history, state of the art and perspective , 2001, Artif. Intell. Medicine.