Instance Selection Techniques for Memory-based Collaborative Filtering

Collaborative filtering (CF) has become an important data mining technique to make personalized recommendations for books, web pages or movies, etc. One popular algorithm is the memory-based collaborative filtering, which predicts a user’s preference based on his or her similarity to other users (instances) in the database. However, the tremendous growth of users and the large number of products, memory-based CF algorithms results in the problem of deciding the right instances to use during prediction, in order to reduce executive cost and excessive storage, and possibly to improve the generalization accuracy by avoiding noise and overfitting. In this paper, we focus our work on a typical user preference database that contains many missing values, and propose four novel instance reduction techniques called TURF1-TURF4 as a preprocessing step to improve the efficiency and accuracy of the memory-based CF algorithm. The key idea is to generate prediction from a carefully selected set of relevant instances. We evaluate the techniques on the well-known EachMovie data set. Our experiments showed that the proposed algorithms not just dramatically speed up the prediction, but also improved the accuracy.

[1]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[2]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[3]  Ke Wang,et al.  RecTree: An Efficient Collaborative Filtering Method , 2001, DaWaK.

[4]  Mark W. Newman,et al.  SWAMI: a framework for collaborative filtering algorithm development and evaluation. , 2000, SIGIR 2000.

[5]  Mark Rosenstein,et al.  Recommending and evaluating choices in a virtual community of use , 1995, CHI '95.

[6]  John Riedl,et al.  Application of Dimensionality Reduction in Recommender System - A Case Study , 2000 .

[7]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[8]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[9]  Pattie Maes,et al.  Social information filtering: algorithms for automating “word of mouth” , 1995, CHI '95.

[10]  Jianping Zhang,et al.  Selecting Typical Instances in Instance-Based Learning , 1992, ML.

[11]  John Riedl,et al.  An algorithmic framework for performing collaborative filtering , 1999, SIGIR '99.

[12]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[13]  Eric Horvitz,et al.  Collaborative Filtering by Personality Diagnosis: A Hybrid Memory and Model-Based Approach , 2000, UAI.

[14]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[15]  William W. Cohen,et al.  Recommendation as Classification: Using Social and Content-Based Information in Recommendation , 1998, AAAI/IAAI.

[16]  John Riedl,et al.  Analysis of recommendation algorithms for e-commerce , 2000, EC '00.

[17]  Michael J. Pazzani,et al.  Learning Collaborative Information Filters , 1998, ICML.

[18]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.