Synonym Extraction of Medical Terms from Clinical Text Using Combinations of Word Space Models

In information extraction, it is useful to know if two signifiers have the same or very similar semantic content. Maintaining such information in a controlled vocabulary is, however, costly. Here it is demonstrated how synonyms of medical terms can be extracted automatically from a large corpus of clinical text using distributional semantics. By combining Random Indexing and Random Permutation, different lexical semantic aspects are captured, effectively increasing our ability to identify synonymic relations between terms. 44% of 340 synonym pairs from MeSH are successfully extracted in a list of ten suggestions. The models can also be used to map abbreviations to their full-length forms; simple pattern-based filtering of the suggestions yields substantial improvements.

[1]  Hua Xu,et al.  Data from clinical notes: a perspective on the tension between structure and flexible documentation , 2011, J. Am. Medical Informatics Assoc..

[2]  Kjetil Nørvåg,et al.  Extracting Named Entities and Synonyms from Wikipedia , 2010, 2010 24th IEEE International Conference on Advanced Information Networking and Applications.

[3]  George Yule,et al.  The study of language , 1998 .

[4]  Michael N Jones,et al.  Representing word meaning and order information in a composite holographic lexicon. , 2007, Psychological review.

[5]  Mike Conway,et al.  Discovering Lexical Instantiations of Clinical Concepts using Web Services, WordNet and Corpus Resources , 2012, AMIA.

[6]  H. Dalianis,et al.  The Stockholm EPR Corpus – Characteristics and Some Initial Findings , 2009 .

[7]  B. Hammond Ontology , 2004, Lawrence Booth’s Book of Visions.

[8]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[9]  Ola Knutsson,et al.  A Robust Shallow Parser for Swedish , 2003 .

[10]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[11]  Graeme Hirst,et al.  Building and Using a Lexical Knowledge Base of Near-Synonym Differences , 2006, Computational Linguistics.

[12]  Magnus Sahlgren,et al.  The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces , 2006 .

[13]  P. Kanerva,et al.  Permutations as a means to encode order in word space , 2008 .

[14]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[15]  David Sánchez,et al.  An ontology-based measure to compute semantic similarity in biomedicine , 2011, J. Biomed. Informatics.

[16]  Craig MacDonald,et al.  Disambiguating biomedical acronyms using EMIM , 2011, SIGIR '11.

[17]  Yue Lu,et al.  An empirical study of gene synonym query expansion in biomedical information retrieval , 2008, Information Retrieval.

[18]  Graeme Hirst,et al.  Near-Synonymy and Lexical Choice , 2002, CL.

[19]  Trevor Cohen,et al.  Reflective Random Indexing and indirect inference: A scalable method for discovery of implicit connections , 2010, J. Biomed. Informatics.

[20]  Anders Holst,et al.  Random indexing of text samples for latent semantic analysis , 2000 .

[21]  Marti A. Hearst,et al.  A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text , 2002, Pacific Symposium on Biocomputing.

[22]  Ziqi Zhang,et al.  Recent advances in methods of lexical semantic relatedness – a survey , 2012, Natural Language Engineering.