Jester 2.0 (poster abstract): evaluation of an new linear time collaborative filtering algorithm

Jester is a WWW-based system that allows users to retrieve jokes based on their ratings of sample jokes. Our emphasis is on a new principal component analysis (PCA) and clustering-based linear time collaborative filtering algorithm for efficient and effective personalized information retrieval. Let m be the number of users in the database (currently over 12000) and n be the number of jokes rated by a user to characterize his or her preference (currently 10). We report new results comparing Jester 1.0’s O(nm) algorithm with Jester 2.0’s O(n) algorithm: the latter improves the retrieval effectiveness by more than 40% and reduces retrieval time by a factor of 12,000. To try Jester, please visit: http://shadow.ieor.berkeley.edu/humor 1. PROBLEM DEFINTION Collaborative Filtering, a.k.a. recommender systems [3], offer promising techniques for personalized information retrieval when preferences are difficult to characterize semantically [ 1,2,4]. The classic collaborative filtering problem has the Following structure given a set of objects with associated ratings, the object space is divided into two sets the predictor set and the recommendation set. A new user rates all the objects in the predictor set. Based on these ratings, objects are retrieved from the recommendation. The system should be : 1. Effective : recommended objects should receive high ratings. 2. Efficient : online recommendation process should run quickly. Konstan, Miller et. al. [2] implemented Grouplens, one of the first system for rating objects (postings) from a variety of different Usenet newsgroups including rechumor. They developed a simple but computationally intensive prediction algorithm based on weighted correlations and reported their results for a relatively small number of users. Breese et. al. [4] at Microsoft Research, classifies the collaborative filtering algorithms into two distinct classes Memory-based and Model-based: memory-based algorithms operate over the entire user database to make predictions, and model-based learn a model, which is then used for predictions. Jester 1.0 is memory-based. Jester 2.0 is modelPermission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise. to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SIGIR ‘99 8199 Berkley, CA, USA