Projection Word Embedding Model With Hybrid Sampling Training for Classifying ICD-10-CM Codes: Longitudinal Observational Study

Background Most current state-of-the-art models for searching the International Classification of Diseases, Tenth Revision Clinical Modification (ICD-10-CM) codes use word embedding technology to capture useful semantic properties. However, they are limited by the quality of initial word embeddings. Word embedding trained by electronic health records (EHRs) is considered the best, but the vocabulary diversity is limited by previous medical records. Thus, we require a word embedding model that maintains the vocabulary diversity of open internet databases and the medical terminology understanding of EHRs. Moreover, we need to consider the particularity of the disease classification, wherein discharge notes present only positive disease descriptions. Objective We aimed to propose a projection word2vec model and a hybrid sampling method. In addition, we aimed to conduct a series of experiments to validate the effectiveness of these methods. Methods We compared the projection word2vec model and traditional word2vec model using two corpora sources: English Wikipedia and PubMed journal abstracts. We used seven published datasets to measure the medical semantic understanding of the word2vec models and used these embeddings to identify the three–character-level ICD-10-CM diagnostic codes in a set of discharge notes. On the basis of embedding technology improvement, we also tried to apply the hybrid sampling method to improve accuracy. The 94,483 labeled discharge notes from the Tri-Service General Hospital of Taipei, Taiwan, from June 1, 2015, to June 30, 2017, were used. To evaluate the model performance, 24,762 discharge notes from July 1, 2017, to December 31, 2017, from the same hospital were used. Moreover, 74,324 additional discharge notes collected from seven other hospitals were tested. The F-measure, which is the major global measure of effectiveness, was adopted. Results In medical semantic understanding, the original EHR embeddings and PubMed embeddings exhibited superior performance to the original Wikipedia embeddings. After projection training technology was applied, the projection Wikipedia embeddings exhibited an obvious improvement but did not reach the level of original EHR embeddings or PubMed embeddings. In the subsequent ICD-10-CM coding experiment, the model that used both projection PubMed and Wikipedia embeddings had the highest testing mean F-measure (0.7362 and 0.6693 in Tri-Service General Hospital and the seven other hospitals, respectively). Moreover, the hybrid sampling method was found to improve the model performance (F-measure=0.7371/0.6698). Conclusions The word embeddings trained using EHR and PubMed could understand medical semantics better, and the proposed projection word2vec model improved the ability of medical semantics extraction in Wikipedia embeddings. Although the improvement from the projection word2vec model in the real ICD-10-CM coding task was not substantial, the models could effectively handle emerging diseases. The proposed hybrid sampling method enables the model to behave like a human expert.

[1]  Amol Wagholikar,et al.  Automated Reconciliation of Radiology Reports and Discharge Summaries , 2015, AMIA.

[2]  Lisa M. Lee,et al.  Public health surveillance and knowing about health in the context of growing sources of health data. , 2011, American journal of preventive medicine.

[3]  Belinda Torney,et al.  Casemix Funding Optimisation: Working Together to Make the Most of Every Episode , 2010, Health information management : journal of the Health Information Management Association of Australia.

[4]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Magda Tsintsadze,et al.  Natural Language Processing Based Instrument for Classification of Free Text Medical Records , 2016, BioMed research international.

[6]  D. Muscatello,et al.  Prospective surveillance of excess mortality due to influenza in New South Wales: feasibility and statistical approach. , 2008, Communicable diseases intelligence quarterly report.

[7]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[8]  Angelos Hliaoutakis,et al.  Semantic Similarity Measures in MeSH Ontology and their application to Information Retrieval on Medline , 2005 .

[9]  Ming Zhang,et al.  Automatic classification of diseases from free-text death certificates for real-time surveillance , 2015, BMC Medical Informatics and Decision Making.

[10]  Reed McEwan,et al.  Corpus domain effects on distributional semantic modeling of medical terms , 2016, Bioinform..

[11]  E. Castilla,et al.  ICD‐10 impact on ascertainment and accuracy of oral cleft cases as recorded by the Brazilian national live birth information system , 2018, American journal of medical genetics. Part A.

[12]  Bram van Ginneken,et al.  A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[13]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14]  Hongfang Liu,et al.  A Comparison of Word Embeddings for the Biomedical Natural Language Processing , 2018, J. Biomed. Informatics.

[15]  Jeffrey Dean,et al.  Scalable and accurate deep learning with electronic health records , 2018, npj Digital Medicine.

[16]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[17]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[18]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[19]  Susan B Jaglal,et al.  Validation of Algorithm to Identify Persons with Non-traumatic Spinal Cord Dysfunction in Canada Using Administrative Health Data. , 2017, Topics in spinal cord injury rehabilitation.

[20]  John C. Platt,et al.  Learning Discriminative Projections for Text Similarity Measures , 2011, CoNLL.

[21]  Yelong Shen,et al.  Learning semantic representations using convolutional neural networks for web search , 2014, WWW.

[22]  Ted Pedersen,et al.  Towards a framework for developing semantic relatedness reference standards , 2011, J. Biomed. Informatics.

[23]  Liyana Shuib,et al.  Automatic ICD-10 multi-class classification of cause of death from plaintext autopsy reports through expert-driven feature selection , 2017, PloS one.

[24]  Goran Nenadic,et al.  Text mining of cancer-related information: Review of current status and future directions , 2014, Int. J. Medical Informatics.

[25]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[26]  Anthony N. Nguyen,et al.  Automatic ICD-10 classification of cancers from free-text death certificates , 2015, Int. J. Medical Informatics.

[27]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[28]  Geoffrey Zweig,et al.  Achieving Human Parity in Conversational Speech Recognition , 2016, ArXiv.

[29]  Ted Pedersen,et al.  UMLS-Interface and UMLS-Similarity : Open Source Software for Measuring Paths and Semantic Similarity , 2009, AMIA.

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Terrence Adam,et al.  Semantic Similarity and Relatedness between Clinical Terms: An Experimental Study. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[32]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[33]  Hude Quan,et al.  Exploration of association rule mining for coding consistency and completeness assessment in inpatient administrative health data , 2018, J. Biomed. Informatics.

[34]  Christopher Meek,et al.  Semantic Parsing for Single-Relation Question Answering , 2014, ACL.

[35]  Ted Pedersen,et al.  Measures of semantic similarity and relatedness in the biomedical domain , 2007, J. Biomed. Informatics.

[36]  T. Murdoch,et al.  The inevitable application of big data to health care. , 2013, JAMA.

[37]  Chia-Jung Hsu,et al.  Artificial Intelligence Learning Semantics via External Resources for Classifying Diagnosis Codes in Discharge Notes , 2017, Journal of medical Internet research.