Extended Explicit Semantic Analysis for Calculating Semantic Relatedness of Web Resources

Finding semantically similar documents is a common task in Recommender Systems. Explicit Semantic Analysis (ESA) is an approach to calculate semantic relatedness between terms or documents based on similarities to documents of a reference corpus. Here, usually Wikipedia is applied as reference corpus. We propose enhancements to ESA (called Extended Explicit Semantic Analysis) that make use of further semantic properties of Wikipedia like article link structure and categorization, thus utilizing the additional semantic information that is included in Wikipedia. We show how we apply this approach to recommendation of web resource fragments in a resource-based learning scenario for self-directed, on-task learning with web resources.

[1]  Benno Stein,et al.  The ESA retrieval model revisited , 2009, SIGIR.

[2]  Kam-Fai Wong,et al.  Natural Language Processing - IJCNLP 2005, Second International Joint Conference, Jeju Island, Korea, October 11-13, 2005, Proceedings , 2005, IJCNLP.

[3]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[4]  Barr and Feigenbaum Edward A. Avron,et al.  The Handbook of Artificial Intelligence , 1981 .

[5]  Doreen Böhnstedt,et al.  Implementation and Evaluation of a Tool for Setting Goals in Self-regulated Learning with Web Resources , 2009, EC-TEL.

[6]  Soumen Chakrabarti,et al.  Mining the web - discovering knowledge from hypertext data , 2002 .

[7]  Evgeniy Gabrilovich,et al.  Overcoming the Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge , 2006, AAAI.

[8]  John A. Barnden,et al.  Semantic Networks , 1998, Encyclopedia of Social Network Analysis and Mining.

[9]  Stuart C. Shapiro,et al.  Encyclopedia of artificial intelligence, vols. 1 and 2 (2nd ed.) , 1992 .

[10]  Holger Schwarz,et al.  Using Wikipedia-Based Conceptual Contexts to Calculate Document Similarity , 2009, 2009 Third International Conference on Digital Society.

[11]  Iryna Gurevych,et al.  Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary , 2008, LREC.

[12]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[13]  Iryna Gurevych,et al.  Using the Structure of a Conceptual Network in Computing Semantic Relatedness , 2005, IJCNLP.

[14]  Doreen Böhnstedt,et al.  Einsatz persönlicher Wissensnetze im Ressourcen-basierten Lernen , 2008, DeLFI.

[15]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[16]  Iryna Gurevych,et al.  Analysis of the Wikipedia Category Graph for NLP Applications , 2007 .

[17]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[18]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[19]  Ulrike Cress,et al.  Learning in the Synergy of Multiple Disciplines, 4th European Conference on Technology Enhanced Learning, EC-TEL 2009, Nice, France, September 29 - October 2, 2009, Proceedings , 2009, EC-TEL.

[20]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[21]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[22]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.