Wikipedia Mining for an Association Web Thesaurus Construction

Wikipedia has become a huge phenomenon on the WWW. As a corpus for knowledge extraction, it has various impressive characteristics such as a huge amount of articles, live updates, a dense link structure, brief link texts and URL identification for concepts. In this paper, we propose an efficient link mining method pfibf (Path Frequency - Inversed Backward link Frequency) and the extension method "forward / backward link weighting (FB weighting)" in order to construct a huge scale association thesaurus. We proved the effectiveness of our proposed methods compared with other conventional methods such as cooccurrence analysis and TF-IDF.

[1]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[2]  Maria Ruiz-Casado,et al.  Automatic Assignment of Wikipedia Encyclopedic Entries to WordNet Synsets , 2005, AWIC.

[3]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[4]  Ian H. Witten,et al.  Mining Domain-Specific Thesauri from Wikipedia: A Case Study , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[5]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[6]  Brian D. Davison Topical locality in the Web , 2000, SIGIR '00.

[7]  Hinrich Schütze,et al.  A Cooccurrence-Based Thesaurus and Two Applications to Information Retrieval , 1994, Inf. Process. Manag..

[8]  Hsinchun Chen,et al.  Automatic Thesaurus Generation for an Electronic Community System , 1995, J. Am. Soc. Inf. Sci..

[9]  Janusz Kacprzyk,et al.  Advances in Web Intelligence , 2003, Lecture Notes in Computer Science.

[10]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[11]  J. Giles Internet encyclopaedias go head to head , 2005, Nature.

[12]  Takahiro Hara,et al.  A Thesaurus Construction Method from Large ScaleWeb Dictionaries , 2007, 21st International Conference on Advanced Information Networking and Applications (AINA '07).

[13]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .