An Effective Complete-Web Recommender System

There are a number of recommendation systems that can suggest the webpages, within a single website, that other (purportedly similar) users have visited. By contrast, our goal is a system that can recommend “information content” (IC) pages — i.e., pages that contain information relevant to the user— from anywhere in the web. This paper describes how we addressed this challenge, We first collected a number of annotated user sessions, whose pages each include a bit indicating whether it was IC. Our system, IC PF, then used this collection to learn the characteristics of words that appear in such IC-pages, in terms of the word’s “browsing features” (e.g., did the user follow links whose anchor included this word, etc.). This paper describes the ICPF system, as well as a tool (AIE) we developed to help users annotate their sessions, and a study we performed to collect these annotated sessions. We also present empirical data that validate the effectiveness of this approach.

[1]  Russell Greiner,et al.  Learning a Model of a Web User's Interests , 2003, User Modeling.

[2]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[3]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[4]  Charles X. Ling,et al.  Data Mining for Direct Marketing: Problems and Solutions , 1998, KDD.

[5]  Andrew Jennings,et al.  A user model neural network for a personal news service , 1993, User Modeling and User-Adapted Interaction.

[6]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[7]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[8]  Jaideep Srivastava,et al.  Data Preparation for Mining World Wide Web Browsing Patterns , 1999, Knowledge and Information Systems.

[9]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[10]  Philip S. Yu,et al.  Efficient Data Mining for Path Traversal Patterns in Distributed Systems , 1996, ICDCS.

[11]  Nathalie Japkowicz,et al.  The Class Imbalance Problem: Significance and Strategies , 2000 .

[12]  Philip S. Yu,et al.  Efficient Data Mining for Path Traversal Patterns , 1998, IEEE Trans. Knowl. Data Eng..

[13]  Ed H. Chi,et al.  Using information scent to model user information needs and actions and the Web , 2001, CHI.

[14]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[15]  Michael J. Pazzani,et al.  A hybrid user model for news story classification , 1999 .

[16]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.