Traditional Chinese medicine symptom normalization approach leveraging hierarchical semantic information and text matching with attention mechanism

Traditional Chinese medicine (TCM) symptom normalization is difficult because the challenges of the symptoms having different literal descriptions, one-to-many symptom descriptions and different symptoms sharing a similar literal description. We propose a novel two-step approach utilizing hierarchical semantic information that represents the functional characteristics of symptoms and develop a text matching model that integrates hierarchical semantic information with an attention mechanism to solve these problems. In this study, we constructed a symptom normalization dataset and a TCM normalization symptom dictionary containing normalization symptom words, and assigned symptoms into 24 classes of functional characteristics. First, we built a multi-label text classifier to isolate the hierarchical semantic information from each symptom description and count the corresponding normalization symptoms and filter the candidate set. Then we designed a text matching model of mixed multi-granularity language features with an attention mechanism that utilizes the hierarchical semantic information to calculate the matching score between the symptom description and the normalization symptom words. We compared our approach with other baselines on real-world data. Our approach gives the best performance with a Hit@ 1, 3, and 10 of 0.821, 0.953, and 0.993, respectively, and a MeanRank of 1.596, thus outperforming significantly regarding the symptom normalization task. We developed an approach for the TCM symptom normalization task and demonstrated its superior performance compared with other baselines, indicating the promise of this research direction.

[1]  Xiaolong Wang,et al.  CNN-based ranking for biomedical entity normalization , 2017, BMC Bioinformatics.

[2]  Wei Ma,et al.  RxNorm: prescription for electronic drug information exchange , 2005, IT Professional.

[3]  Huajun Chen,et al.  Modern bioinformatics meets traditional Chinese medicine , 2014, Briefings Bioinform..

[4]  Huijuan Lu,et al.  Automatic ICD-10 coding algorithm using an improved longest common subsequence based on semantic similarity , 2017, PloS one.

[5]  Runtong Zhang,et al.  A hierarchical method to automatically encode Chinese diagnoses through semantic similarity estimation , 2016, BMC Medical Informatics and Decision Making.

[6]  Ying Yu,et al.  Automatic ICD code assignment of Chinese clinical notes based on multilayer attention BiRNN , 2019, J. Biomed. Informatics.

[7]  Xia Chen,et al.  Automatic symptom name normalization in clinical records of traditional Chinese medicine , 2010, BMC Bioinformatics.

[8]  Bowen Zhou,et al.  ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs , 2015, TACL.

[9]  DONG Yan,et al.  Classification of symptoms and signs in clinical term system of traditional Chinese medicine , 2016 .

[10]  Zhiyong Lu,et al.  DNorm: disease name normalization with pairwise learning to rank , 2013, Bioinform..

[11]  Zhang Qi-ming Independent symptoms with the least intension , 2010 .

[12]  Cui Tao,et al.  Toward a normalized clinical drug knowledge base in China—applying the RxNorm model to Chinese clinical drugs , 2018, J. Am. Medical Informatics Assoc..

[13]  Li Ke,et al.  A study of entity-linking methods for normalizing Chinese diagnosis and procedure terms to ICD codes , 2020, J. Biomed. Informatics.

[14]  Li Hou,et al.  Mining and standardizing chinese consumer health terms , 2018, BMC Medical Informatics and Decision Making.

[15]  Sergey I. Nikolenko,et al.  Medical concept normalization in social media posts with recurrent neural networks , 2018, J. Biomed. Informatics.

[16]  Jinmiao Huang,et al.  An Empirical Evaluation of Deep Learning for ICD-9 Code Assignment using MIMIC-III Clinical Notes , 2018, Comput. Methods Programs Biomed..

[17]  Shao Li,et al.  Bridging the gap between traditional Chinese medicine and systems biology: the connection of Cold Syndrome and NEI network. , 2010, Molecular bioSystems.