Semantic relatedness measurement based on Wikipedia link co-occurrence analysis

Purpose – Recently, the importance and effectiveness of Wikipedia Mining has been shown in several researches. One popular research area on Wikipedia Mining focuses on semantic relatedness measurement, and research in this area has shown that Wikipedia can be used for semantic relatedness measurement. However, previous methods are facing two problems; accuracy and scalability. To solve these problems, the purpose of this paper is to propose an efficient semantic relatedness measurement method that leverages global statistical information of Wikipedia. Furthermore, a new test collection is constructed based on Wikipedia concepts for evaluating semantic relatedness measurement methods.Design/methodology/approach – The authors' approach leverages global statistical information of the whole Wikipedia to compute semantic relatedness among concepts (disambiguated terms) by analyzing co‐occurrences of link pairs in all Wikipedia articles. In Wikipedia, an article represents a concept and a link to another articl...

[1]  Takahiro Hara,et al.  Wikipedia Mining for an Association Web Thesaurus Construction , 2007, WISE.

[2]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[3]  Hinrich Schütze,et al.  A Cooccurrence-Based Thesaurus and Two Applications to Information Retrieval , 1994, Inf. Process. Manag..

[4]  J. Giles Internet encyclopaedias go head to head , 2005, Nature.

[5]  Maria Ruiz-Casado,et al.  Automatic Assignment of Wikipedia Encyclopedic Entries to WordNet Synsets , 2005, AWIC.

[6]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[7]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[8]  Hsinchun Chen,et al.  Automatic Thesaurus Generation for an Electronic Community System , 1995, J. Am. Soc. Inf. Sci..

[9]  Nicholas J. Belkin,et al.  SIGIR 2000: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, July 24-28, 2000, Athens, Greece , 1999 .

[10]  Saif Mohammad,et al.  Estimating Semantic Distance Using Soft Semantic Constraints in Knowledge-Source – Corpus Hybrid Models , 2009, EMNLP.

[11]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[12]  Carolyn J. Crouch,et al.  A cluster-based approach to thesaurus construction , 1988, SIGIR '88.

[13]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[14]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[15]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[16]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[17]  Tuoi Thi Phan,et al.  A hybrid solution of ontology-based query expansion , 2008, Int. J. Web Inf. Syst..

[18]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[19]  Ian H. Witten,et al.  Mining Domain-Specific Thesauri from Wikipedia: A Case Study , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[20]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[21]  Yuen-Hsien Tseng,et al.  Automatic thesaurus generation for Chinese documents , 2002, J. Assoc. Inf. Sci. Technol..

[22]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[23]  Takahiro Hara,et al.  Association thesaurus construction methods based on link co-occurrence analysis for wikipedia , 2008, CIKM '08.

[24]  Takahiro Hara,et al.  A Thesaurus Construction Method from Large ScaleWeb Dictionaries , 2007, 21st International Conference on Advanced Information Networking and Applications (AINA '07).

[25]  Wei-Ying Ma,et al.  Building a web thesaurus from web link structure , 2003, SIGIR.

[26]  Brian D. Davison Topical locality in the Web , 2000, SIGIR '00.

[27]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.