Augmenting Domain-Specific Thesauri with Knowledge from Wikipedia

propose a new method for extending a domain-specific thesaurus with valuable information from Wikipedia. The main obstacle is to disambiguate thesaurus concepts to c orrect Wikipedia articles. Given the concept name, we firs t identify candidate mappings by analyzing article titles, the ir redirects and disambiguation pages. Then, for each candidate, we compute a link-based similarity score to all mappin gs of context terms related to this concept. The article with the highest score is then used to augment the thesaurus concept. It i s the source for the extended gloss, explaining the concept's me aning, synonymous expressions that can be used as addition al non- descriptors in the thesaurus, translations of the c oncept into other languages, and new domain-relevant concepts.

[1]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[2]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[3]  David N. Milne Computing Semantic Relatedness using Wikipedia Link Structure , 2007 .

[4]  Jian Hu,et al.  Improving Text Classification by Using Encyclopedia Knowledge , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[5]  Maria Ruiz-Casado,et al.  Automatic Assignment of Wikipedia Encyclopedic Entries to WordNet Synsets , 2005, AWIC.

[6]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[7]  Ian H. Witten,et al.  Mining Domain-Specific Thesauri from Wikipedia: A Case Study , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[8]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.