Improving User Modelling with Content-Based Techniques

SiteIF is a personal agent for a bilingual news web site that learns user's interests from the requested pages. In this paper we propose to use a content-based document representation as a starting point to build a model of the user's interests. Documents passed over are processed and relevant senses (disambiguated over WordNet) are extracted and then combined to form a semantic network. A filtering procedure dynamically predicts new documents on the basis of the semantic network. There are two main advantages of a content-based approach: first, the model predictions, being based on senses rather then words, are more accurate; second, the model is language independent, allowing navigation in multilingual sites. We report the results of a comparative experiment that has been carried out to give a quantitative estimation of these improvements.