A Hybrid Method for ICD-10 Auto-Coding of Chinese Diagnoses

The Chinese Version of Classification and Codes of Diseases (CCD) is an expanded version of ICD-10. Hospitals are required to assign CCD codes to discharge diagnoses in China. To handle the contradiction between a shortage of skilled CCD coders and increasing coding efficiency, a CCD auto-coding method is urgently needed. In this study a hybrid auto-coding method was proposed based on the lexical characteristics obtained through the analysis of a corpus of 1537 diagnoses with normative CCD code. It combines the rule-based approach, the Chinese characters-based distributed semantic similarity and the dictionary-based approach. The rule-based approach was proved to be efficient and precise at the cost of time and manpower. The semantic similarity approach shows poor performance. The old-fashioned dictionary-based approach ends in leading significance. The final accuracy of this hybrid approach is 96.9% in the test.