Disambiguating Biomedical Abbreviations Based on K-Nearest Neighbor with Weighted Voting Method

Information extraction from biomedical literature is very useful for utilizing the achievements in biomedical field and promoting further improvement of Biology and Medicine.This paper,aiming at biomedical abbreviation analysis and understanding,proposes an approach for disambiguating biomedical abbreviations based on K-nearest neighbor(K-NN) with weighted voting.In the approach,the samples with labels are generated automatically based on the hypothesis of "One Sense Per Discourse".And the wordsdescribing the topic of a discourse are chosen as the features for abbreviation disambiguation.The classification model used in the approach is based on K-NN with weighted voting.The experimental results on a testing set containing 177762 Medline abstracts show that the approach proposed in the paper can obtain higher precision than others in related work.The experiments also prove that K-NN with weighted voting can get not only higher precision,but also better stability in comparison with the traditional K-NN in abbreviation disambiguation task.