论文信息 - Wikipedia Relatedness Measurement Methods and Influential Features

Wikipedia Relatedness Measurement Methods and Influential Features

As a corpus for knowledge extraction, Wikipedia has become one of the promising resources among researchers in various domains such as NLP, WWW, IR and AI since it has a great coverage of concepts for wide-range domain, remarkable accuracy and easy-handled structure for analysis. Relatedness measurement among concepts is one of the traditional research topics on Wikipedia analysis. The value of relatedness measurement research is widely recognized because of the wide range of applications such as query expansion in IR and context recognition in WSD (Word Sense Disambiguation). A number of approaches have been proposed and they proved that there are many features that can be used to measure relatedness among concepts in Wikipedia. In the past, previous researches, many features such as categories, co-occurrence of terms (links), inter-page links and Infoboxes are used to this aim. What seems lacking, however, is an integrated feature selection model for these dispersed features since it is still unclear that which feature is influential and how can we integrate them in order to achieve higher accuracy. This paper is a position paper that proposes a SVR (Support Vector Regression) based integrated feature selection model to investigate the influence of each feature and seek a combine model of features that achieves high accuracy and coverage.

Takahiro Hara | Shojiro Nishio | Masahiro Ito | Kotaro Nakayama

[1] Simone Paolo Ponzetto,et al. WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[2] Martha Palmer,et al. Verb Semantics and Lexical Selection , 1994, ACL.

[3] George A. Miller,et al. Using Corpus Statistics and WordNet Relations for Sense Identification , 1998, CL.

[4] Takahiro Hara,et al. Wikipedia Mining for an Association Web Thesaurus Construction , 2007, WISE.

[5] Alexander J. Smola,et al. Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[6] Michael McGill,et al. Introduction to Modern Information Retrieval , 1983 .

[7] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[8] Evgeniy Gabrilovich,et al. Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[9] Takahiro Hara,et al. Association thesaurus construction methods based on link co-occurrence analysis for wikipedia , 2008, CIKM '08.