User Modelling for News Web Sites with Word Sense Based Techniques

SiteIF is a personal agent for a bilingual news web site that learns user’s interests from the requested pages. In this paper we propose to use a word sense based document representation as a starting point to build a model of the user’s interests. Documents passed over are processed and relevant senses (disambiguated over WordNet) are extracted and then combined to form a semantic network. A filtering procedure dynamically predicts new documents on the basis of the semantic network.There are two main advantages of a sense-based approach: first, the model predictions, being based on senses rather than words, are more accurate; second, the model is language independent, allowing navigation in multilingual sites. We report the results of a comparative experiment that has been carried out to give a quantitative estimation of these improvements.

[1]  Eric Atwell,et al.  A lexical database for English learners and users: the Oxford advanced learner's dictionary , 1989 .

[2]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[3]  W. Bruce Croft,et al.  Lexical ambiguity and information retrieval , 1992, TOIS.

[4]  Ellen M. Voorhees,et al.  Using WordNet to disambiguate word senses for text retrieval , 1993, SIGIR.

[5]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[6]  Philip Resnik,et al.  Disambiguating Noun Groupings with Respect to Wordnet Senses , 1995, VLC@ACL.

[7]  Carlo Strapparava,et al.  WordNet for Italian and Its Use for Lexical Deiscrimination , 1997, AI*IA.

[8]  W. Bruce Croft,et al.  Phrasal translation and query expansion techniques for cross-language information retrieval , 1997, SIGIR '97.

[9]  David A. Hull Using Structured Queries for Disambiguation in Cross-Language Information Retrieval , 1997 .

[10]  Henry Lieberman,et al.  Let's browse: a collaborative Web browsing agent , 1998, IUI '99.

[11]  Julio Gonzalo,et al.  Indexing with WordNet synsets can improve text retrieval , 1998, WordNet@ACL/COLING.

[12]  Carol Peters,et al.  Applying EuroWordNet to cross-language text retrieval , 1998 .

[13]  Carol Peters,et al.  Applying EuroWordNet to Cross-Language Text Retrieval , 1998, Comput. Humanit..

[14]  Gregory Grefenstette,et al.  Cross-Language Information Retrieval , 1998, The Springer International Series on Information Retrieval.

[15]  Carlo Strapparava,et al.  Experiments in Word Domain Disambiguation for Parallel Texts , 2000, ACL 2000.

[16]  Adam Kilgarriff,et al.  What’s in a Thesaurus? , 2000, LREC.

[17]  Bernardo Magnini,et al.  Integrating Subject Field Codes into WordNet , 2000, LREC.

[18]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[19]  Carlo Strapparava,et al.  Sense-Based User Modelling for Web Sites , 2000, AH.

[20]  Carlo Strapparava,et al.  Improving User Modelling with Content-Based Techniques , 2001, User Modeling.

[21]  Yorick Wilks,et al.  The Interaction of Knowledge Sources in Word Sense Disambiguation , 2001, CL.

[22]  Michael J. Pazzani,et al.  Adaptive interfaces for ubiquitous web access , 2002, CACM.

[23]  Carlo Strapparava,et al.  The role of domain information in Word Sense Disambiguation , 2002, Natural Language Engineering.

[24]  Michael J. Pazzani Adaptive Interfaces for Ubiquitous Web Access , 2003, User Modeling.

[25]  Alessandro Micarelli,et al.  Anatomy and Empirical Evaluation of an Adaptive Web-Based Information Filtering System , 2004, User Modeling and User-Adapted Interaction.

[26]  Peter Brusilovsky,et al.  Methods and techniques of adaptive hypermedia , 1996, User Modeling and User-Adapted Interaction.

[27]  Annika Waern,et al.  User Involvement in Automatic Filtering: An Experimental Study , 2004 .