Automatic ICD-10 coding algorithm using an improved longest common subsequence based on semantic similarity

ICD-10(International Classification of Diseases 10th revision) is a classification of a disease, symptom, procedure, or injury. Diseases are often described in patients’ medical records with free texts, such as terms, phrases and paraphrases, which differ significantly from those used in ICD-10 classification. This paper presents an improved approach based on the Longest Common Subsequence (LCS) and semantic similarity for automatic Chinese diagnoses, mapping from the disease names given by clinician to the disease names in ICD-10. LCS refers to the longest string that is a subsequence of every member of a given set of strings. The proposed method of improved LCS in this paper can increase the accuracy of processing in Chinese disease mapping.

[1]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[2]  Mingwei Yu,et al.  Analysis of similarity measure in the longitudinal study using improved longest common subsequence method for lung cancer , 2015, Biomed. Signal Process. Control..

[3]  Anthony N. Nguyen,et al.  Automatic ICD-10 classification of cancers from free-text death certificates , 2015, Int. J. Medical Informatics.

[4]  He Yan-xiang,et al.  Hierarchical classification of Chinese documents based onN-grams , 2009, Wuhan University Journal of Natural Sciences.

[5]  Özlem Uzuner,et al.  Three Approaches to Automatic Assignment of ICD-9-CM Codes to Radiology Reports , 2007, AMIA.

[6]  Jian Zhang,et al.  On the use of words and n-grams for Chinese information retrieval , 2000, IRAL '00.

[7]  MousaviSayyed Rasoul,et al.  An improved algorithm for the longest common subsequence problem , 2012 .

[8]  Lu Yu-liang Improved N-gram model based on ontology for web page classification , 2007 .

[9]  Richárd Farkas,et al.  Automatic construction of rule-based ICD-9-CM coding systems , 2008, BMC Bioinformatics.

[10]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[11]  John F. Hurdle,et al.  Measuring diagnoses: ICD code accuracy. , 2005, Health services research.

[12]  Gary Benson,et al.  Longest Common Subsequence in k Length Substrings , 2013, SISAP.

[13]  Koby Crammer,et al.  Automatic Code Assignment to Medical Text , 2007, BioNLP@ACL.

[14]  Zhang Cheng Comparation of String Similarity Algorithm , 2012 .

[15]  Svetla Boytcheva,et al.  Automatic Matching of ICD-10 codes to Diagnoses in Discharge Letters , 2011 .

[16]  Liu Qun Chinese Lexical Analysis Using Cascaded Hidden Markov Model , 2004 .

[17]  György Surján,et al.  Using n-gram method in the decomposition of compound medical diagnoses , 2003, Int. J. Medical Informatics.

[18]  Lu Huijuan,et al.  An approach to semantic query expansion system based on Hepatitis ontology , 2016, Journal of Biological Research-Thessaloniki.

[19]  Sayyed Rasoul Mousavi,et al.  An improved algorithm for the longest common subsequence problem , 2012, Comput. Oper. Res..

[20]  Sun Ping,et al.  The Research of Chinese Semantic Similarity Calculation Introduced Punctuations , 2010, J. Convergence Inf. Technol..

[21]  W Paoin,et al.  Development of ICD-10-TM Ontology for a Semi-automated Morbidity Coding System in Thailand , 2012, Methods of Information in Medicine.

[22]  Pierre Zweigenbaum,et al.  Hybrid methods for ICD-10 coding of death certificates , 2016, Louhi@EMNLP.

[23]  Damla Arifoglu,et al.  CodeMagic: Semi-Automatic Assignment of ICD-10-AM Codes to Patient Records , 2014, ISCIS.

[24]  Bin Liu,et al.  Measuring Semantic Similarity between Words Using HowNet , 2008, 2008 International Conference on Computer Science and Information Technology.

[25]  Christodoulos A. Floudas,et al.  Microarray data mining: A novel optimization-based approach to uncover biologically coherent structures , 2008, BMC Bioinformatics.

[26]  Runtong Zhang,et al.  A hierarchical method to automatically encode Chinese diagnoses through semantic similarity estimation , 2016, BMC Medical Informatics and Decision Making.

[27]  Qun Liu,et al.  基於《知網》的辭彙語義相似度計算 (Word Similarity Computing Based on How-net) [In Chinese] , 2002, ROCLING/IJCLCLP.

[28]  Shuigeng Zhou,et al.  Hierarchical Classification of Chinese Documents Based on N-grams , 2003, ICADL.

[29]  Yitao Zhang,et al.  Developing Feature Types for Classifying Clinical Notes , 2007, BioNLP@ACL.