论文信息 - Synonym Extraction of Medical Terms from Clinical Text Using Combinations of Word Space Models

Synonym Extraction of Medical Terms from Clinical Text Using Combinations of Word Space Models

In information extraction, it is useful to know if two signifiers have the same or very similar semantic content. Maintaining such information in a controlled vocabulary is, however, costly. Here it is demonstrated how synonyms of medical terms can be extracted automatically from a large corpus of clinical text using distributional semantics. By combining Random Indexing and Random Permutation, different lexical semantic aspects are captured, effectively increasing our ability to identify synonymic relations between terms. 44% of 340 synonym pairs from MeSH are successfully extracted in a list of ten suggestions. The models can also be used to map abbreviations to their full-length forms; simple pattern-based filtering of the suggestions yields substantial improvements.

[1] Hua Xu,et al. Data from clinical notes: a perspective on the tension between structure and flexible documentation , 2011, J. Am. Medical Informatics Assoc..

[2] Kjetil Nørvåg,et al. Extracting Named Entities and Synonyms from Wikipedia , 2010, 2010 24th IEEE International Conference on Advanced Information Networking and Applications.

[3] George Yule,et al. The study of language , 1998 .

[4] Michael N Jones,et al. Representing word meaning and order information in a composite holographic lexicon. , 2007, Psychological review.

[5] Mike Conway,et al. Discovering Lexical Instantiations of Clinical Concepts using Web Services, WordNet and Corpus Resources , 2012, AMIA.

[6] H. Dalianis,et al. The Stockholm EPR Corpus – Characteristics and Some Initial Findings , 2009 .

[7] B. Hammond. Ontology , 2004, Lawrence Booth’s Book of Visions.

[8] Richard A. Harshman,et al. Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[9] Ola Knutsson,et al. A Robust Shallow Parser for Swedish , 2003 .

[10] Zellig S. Harris,et al. Distributional Structure , 1954 .

[11] Graeme Hirst,et al. Building and Using a Lexical Knowledge Base of Near-Synonym Differences , 2006, Computational Linguistics.

[12] Magnus Sahlgren,et al. The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces , 2006 .

[13] P. Kanerva,et al. Permutations as a means to encode order in word space , 2008 .

[14] Gerard Salton,et al. A vector space model for automatic indexing , 1975, CACM.