Using linear classifiers in the integration of user modeling and text content analysis in the person

Nowadays many newspapers and news agencies offer personalized information access services and, moreover, there is a growing interest in the improvement of these services. In this paper we present a methodology useful to improve the intelligent personalization of news services and the way it has been applied to a Spanish relevant newspaper: ABC. Our methodology integrates textual content analysis tasks and machine learning techniques to achieve an elaborated user model, which represents separately short-term needs and long-term multi-topic interests. The characterization of a user’s interests includes his preferences about structure (newspaper sections), content and information delivery. A wide coverage and non-specific-domain classification of topics and a personal set of keywords allow the user to define his preferences about content. Machine learning techniques are used to obtain an initial representation of each category of the topic classification. Finally, we introduce some details about the Mercurio system, which is being used to implement this methodology for ABC. We describe our experience and an evaluation of the system in comparison with other commercial systems.

[1]  Pablo Gervás,et al.  Evaluating a User-Model Based Personalisation Architecture for Digital News Services , 2000, ECDL.

[2]  Manuel de Buenaga Rodríguez,et al.  Using WordNet to Complement Training Information in Text Categorization , 1997, ArXiv.

[3]  W. Bruce Croft,et al.  Combining classifiers in text categorization , 1996, SIGIR '96.

[4]  Fabrizio Sebastiani,et al.  A Tutorial on Automated Text Categorisation , 2000 .

[5]  Ido Dagan,et al.  Mistake-Driven Learning in Text Categorization , 1997, EMNLP.

[6]  James P. Callan,et al.  Training algorithms for linear text classifiers , 1996, SIGIR '96.

[7]  Sholom M. Weiss,et al.  Automated learning of decision rules for text categorization , 1994, TOIS.

[8]  David A. Hull Improving text retrieval for the routing problem using latent semantic indexing , 1994, SIGIR '94.

[9]  Marko Balabanovid,et al.  An Interface for Learning Multi-topic User Profiles from Implicit Feedback , 1998 .

[10]  Michael J. Pazzani,et al.  A hybrid user model for news story classification , 1999 .

[11]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[12]  Hector Garcia-Molina,et al.  SIFT - a Tool for Wide-Area Information Dissemination , 1995, USENIX.

[13]  Yoram Singer,et al.  Context-sensitive learning methods for text categorization , 1996, SIGIR '96.

[14]  Hinrich Schütze,et al.  A comparison of classifiers and document representations for the routing problem , 1995, SIGIR '95.

[15]  Umberto Straccia,et al.  User Profile Modeling and Applications to Digital Libraries , 1999, ECDL.

[16]  Manuel J. Maña López,et al.  Using and Evaluating User Directed Summaries to Improve Information Access , 1999, ECDL.

[17]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[18]  Katia P. Sycara,et al.  WebMate: a personal agent for browsing and searching , 1998, AGENTS '98.

[19]  Giuseppe Attardi,et al.  Automatic Web Page Categorization by Link and Context Analysis , 1999 .

[20]  Pablo Gervás,et al.  Sections, categories and keywords as interest specification tools for personalised news services , 2001, Online Inf. Rev..

[21]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.