论文信息 - Acronym Disambiguation Using Word Embedding

Acronym Disambiguation Using Word Embedding

According to the website AcronymFinder.com which is one of the world's largest and most comprehensive dictionaries of acronyms, an average of 37 new human-edited acronym definitions are added every day. There are 379,918 acronyms with 4,766,899 definitions on that site up to now, and each acronym has 12.5 definitions on average. It is a very important research topic to identify what exactly an acronym means in a given context for document comprehension as well as for document retrieval. In this paper, we propose two word embedding based models for acronym disambiguation. Word embedding is to represent words in a continuous and multidimensional vector space, so that it is easy to calculate the semantic similarity between words by calculating the vector distance. We evaluate the models on MSH Dataset and ScienceWISE Dataset, and both models outperform the state-of-art methods on accuracy. The experimental results show that word embedding helps to improve acronym disambiguation.

[1] Ying Liu,et al. Using Second-order Vectors in a Knowledge-based Method for Acronym Disambiguation , 2011, CoNLL.

[2] Gianluca Demartini,et al. Ontology-Based Word Sense Disambiguation for Scientific Literature , 2013, ECIR.

[3] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[4] Olivier Bodenreider,et al. The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[5] Ted Pedersen,et al. Abbreviation and Acronym Disambiguation in Clinical Discourse , 2005, AMIA.

[6] Jian Su,et al. Entity Linking with Effective Acronym Expansion, Instance Selection, and Topic Modeling , 2011, IJCAI.

[7] Yaakov HaCohen-Kerner,et al. Combined One Sense Disambiguation of Abbreviations , 2008, ACL.