An enhanced computational feature selection method for medical synonym identification via bilingualism and multi-corpus training

Medical synonym identification has been an important part of medical natural language processing (NLP). However, in the field of Chinese medical synonym identification, there are problems like low precision and low recall rate. To solve the problem, in this paper, we propose a method for identifying Chinese medical synonyms. We first selected 13 features including Chinese and English features. Then we studied the synonym identification results of each feature alone and different combinations of the features. Through the comparison among identification results, we present an optimal combination of features for Chinese medical synonym identification. Experiments show that our selected features have achieved 97.37% precision rate, 96.00% recall rate and 97.33% F1 score.

[1]  Wenjie Li,et al.  Component-Enhanced Chinese Character Embeddings , 2015, EMNLP.

[2]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[3]  Bowen Zhou,et al.  Medical Synonym Extraction with Concept Space Models , 2015, IJCAI.

[4]  Hong Yu,et al.  Mining and Ranking Biomedical Synonym Candidates from Wikipedia , 2015, Louhi@EMNLP.

[5]  Kaiyu Liu,et al.  Normalization of Chinese Informal Medical Terms Based on Multi-field Indexing , 2014, NLPCC.

[6]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[7]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[8]  Rohit J. Kate Normalizing clinical terms using learned edit distance patterns , 2016, J. Am. Medical Informatics Assoc..

[9]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[10]  Jie Zhu,et al.  Entity Recognition and Linking in Chinese Search Queries , 2015, NLPCC.

[11]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[12]  Ming Zhou,et al.  Identifying Synonyms among Distributionally Similar Words , 2003, IJCAI.

[13]  Jörg Tiedemann,et al.  Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity , 2006, ACL.

[14]  Amit P. Sheth,et al.  Pattern-based synonym and antonym extraction , 2010, ACM SE '10.

[15]  Chris Brew,et al.  Using the Wiktionary Graph Structure for Synonym Detection , 2009, PWNLP@IJCNLP.

[16]  Lidong Bing,et al.  Distant IE by Bootstrapping Using Lists and Document Structure , 2016, AAAI.

[17]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.