论文信息 - Efficient Ranking and Computation of Semantic Relatedness and its Application to Word Sense Disambiguation

Efficient Ranking and Computation of Semantic Relatedness and its Application to Word Sense Disambiguation

Wikipedia has grown into a high quality up-to-date knowledge base and can enable many intelligent systems that rely on semantic information. One of the most general and quite powerful semantic tools is a measure of semantic relatedness between concepts. Moreover, the ability to efficiently produce a list of ranked similar concepts for a given concept is very important for a wide range of applications. We propose to use a simple measure of similarity between Wikipedia concepts, based on Dice’s measure, and provide very efficient heuristic methods to compute top k ranking results. We also present a randomized algorithm that speeds up the evaluation of the measure for a pair of articles. Furthermore, since our heuristics are based on statistical properties of scale-free networks, we show that these heuristics are applicable to other complex ontologies. Finally, in order to evaluate the measure, we have used it to solve the problem of word-sense disambiguation. Our approach to word sense disambiguation is based solely on the similarity measure and produces results with high accuracy.

Pavel Velikhov | Denis Turdakov | Dmitry Lizorkin | Maxim Grinev

[1] Razvan C. Bunescu,et al. Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[2] Denis Turdakov. Recommender System Based on User-generated Content , 2007, SYRCoDIS.

[3] Jennifer Widom,et al. SimRank: a measure of structural-context similarity , 2002, KDD.

[4] Simone Paolo Ponzetto,et al. WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[5] David N. Milne. Computing Semantic Relatedness using Wikipedia Link Structure , 2007 .

[6] Evgeniy Gabrilovich,et al. Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[7] Albert-László Barabási,et al. Statistical mechanics of complex networks , 2001, ArXiv.

[8] Roberto J. Bayardo,et al. Athena: Mining-Based Interactive Management of Text Database , 2000, EDBT.

[9] Pierre Senellart,et al. Finding Related Pages Using Green Measures: An Illustration with Wikipedia , 2007, AAAI.

[10] Adam Kilgarriff,et al. The Senseval-3 English lexical sample task , 2004, SENSEVAL@ACL.

[11] Krys J. Kochut,et al. Wikipedia in Action: Ontological Knowledge in Text Categorization , 2008, 2008 IEEE International Conference on Semantic Computing.