Towards a Scalable k NN CF Algorithm: Exploring Effective Applications of Clustering

Collaborative Filtering (CF)-based recommender systems bring mutual benefits to both users and the operators of the sites with too much information. Users benefit as they are able to find items of interest from an unmanageable number of available items. On the other hand, e-commerce sites that employ recommender systems can increase sales revenue in at least two ways: a) by drawing customers' attention to items that they are likely to buy, and b) by cross-selling items. However, the sheer number of customers and items typical in e-commerce systems demand specially designed CF algorithms that can gracefully cope with the vast size of the data. Many algorithms proposed thus far, where the principal concern is recommendation quality, may be too expensive to operate in a large-scale system. We propose CLUSTKNN, a simple and intuitive algorithm that is well suited for large data sets. The method first compresses data tremendously by building a straightforward but efficient clustering model. Recommendations are then generated quickly by using a simple NEAREST NEIGHBOR-based approach. We demonstrate the feasibility of CLUSTKNN both analytically and empirically. We also show, by comparing with a number of other popular CF algorithms that, apart from being highly scalable and intuitive, CLUSTKNN provides very good recommendation accuracy as well.

[1]  John Riedl,et al.  Analysis of recommendation algorithms for e-commerce , 2000, EC '00.

[2]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[3]  Bojan Cestnik,et al.  Estimating Probabilities: A Crucial Task in Machine Learning , 1990, ECAI.

[4]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[5]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[6]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[7]  John Riedl,et al.  An algorithmic framework for performing collaborative filtering , 1999, SIGIR '99.

[8]  David M. Pennock,et al.  Categories and Subject Descriptors , 2001 .

[9]  Thomas Hofmann,et al.  Latent semantic models for collaborative filtering , 2004, TOIS.

[10]  John Riedl,et al.  Recommender Systems for Large-scale E-Commerce : Scalable Neighborhood Formation Using Clustering , 2002 .

[11]  Dean P. Foster,et al.  Clustering Methods for Collaborative Filtering , 1998, AAAI 1998.

[12]  Benjamin M. Marlin,et al.  Modeling User Rating Profiles For Collaborative Filtering , 2003, NIPS.

[13]  Tommi S. Jaakkola,et al.  Weighted Low-Rank Approximations , 2003, ICML.

[14]  D. Bridge,et al.  RecTree Centroid : An Accurate , Scalable Collaborative Recommender , .

[15]  John Riedl,et al.  Application of Dimensionality Reduction in Recommender System - A Case Study , 2000 .

[16]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[17]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[18]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[19]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.

[20]  Kenneth Y. Goldberg,et al.  Eigentaste: A Constant Time Collaborative Filtering Algorithm , 2001, Information Retrieval.

[21]  Olfa Nasraoui,et al.  Complete This Puzzle: A Connectionist Approach to Accurate Web Recommendations Based on a Committee of Predictors , 2004, WebKDD.

[22]  Eric Horvitz,et al.  Collaborative Filtering by Personality Diagnosis: A Hybrid Memory and Model-Based Approach , 2000, UAI.

[23]  Ke Wang,et al.  RecTree: An Efficient Collaborative Filtering Method , 2001, DaWaK.

[24]  Hans-Peter Kriegel,et al.  Instance Selection Techniques for Memory-based Collaborative Filtering , 2002, SDM.

[25]  Raymond J. Mooney,et al.  Content-boosted collaborative filtering for improved recommendations , 2002, AAAI/IAAI.

[26]  Kirsten Swearingen,et al.  Interaction Design for Recommender Systems , 2002 .

[27]  William W. Cohen,et al.  Recommendation as Classification: Using Social and Content-Based Information in Recommendation , 1998, AAAI/IAAI.

[28]  John Yen,et al.  Advances in Web Mining and Web Usage Analysis, 8th International Workshop on Knowledge Discovery on the Web, WebKDD 2006, Philadelphia, PA, USA, August 20, 2006, Revised Papers , 2007, WebKDD.

[29]  Eric Horvitz,et al.  Collaborative filtering by personality diagnosis , 2000, UAI 2000.

[30]  Cyril W. Cleverdon,et al.  Factors determining the performance of indexing systems , 1966 .

[31]  Qiang Yang,et al.  Scalable collaborative filtering using cluster-based smoothing , 2005, SIGIR '05.

[32]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[33]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[34]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[35]  Cyril W. Cleverdon,et al.  Aslib Cranfield research project - Factors determining the performance of indexing systems; Volume 1, Design; Part 2, Appendices , 1966 .

[36]  Bradley N. Miller,et al.  MovieLens Unplugged: Experiences with a Recommender System on Four Mobile Devices , 2004 .

[37]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.