Leveraging clustering to improve collaborative filtering

Extensive work on matrix factorization (MF) techniques have been done recently as they provide accurate rating prediction models in recommendation systems. Additional extensions, such as neighbour-aware models, have been shown to improve rating prediction further. However, these models often suffer from a long computation time. In this paper, we propose a novel method that applies clustering algorithms to the latent vectors of users and items. Our method can capture the common interests between the cluster of users and the cluster of items in a latent space. A matrix factorization technique is then applied to this cluster-level rating matrix to predict the future cluster-level interests. We then aggregate the traditional user-item rating predictions with our cluster-level rating predictions to improve the rating prediction accuracy. Our method is a general “wrapper” that can be applied to all collaborative filtering methods. In our experiments, we show that our new approach, when applied to a variety of existing matrix factorization techniques, improves their rating predictions and also results in better rating predictions for cold-start users. Above all, in this paper we show that better quality and more quantity of these clusters achieve a better rating prediction accuracy.

[1]  Alexander J. Smola,et al.  CoBaFi: collaborative bayesian filtering , 2014, WWW.

[2]  Yehuda Koren,et al.  Factor in the neighbors: Scalable and accurate collaborative filtering , 2010, TKDD.

[3]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[4]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[5]  Srujana Merugu,et al.  A scalable collaborative filtering framework based on co-clustering , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[6]  George Karypis,et al.  Sparse linear methods with side information for top-n recommendations , 2012, RecSys.

[7]  Charles X. Ling,et al.  Improving Top-N Recommendation for Cold-Start Users via Cross-Domain Information , 2015, TKDD.

[8]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[9]  John Riedl,et al.  Collaborative Filtering Recommender Systems , 2011, Found. Trends Hum. Comput. Interact..

[10]  George Karypis,et al.  A Comprehensive Survey of Neighborhood-based Recommendation Methods , 2011, Recommender Systems Handbook.

[11]  John Riedl,et al.  Recommender systems: from algorithms to user experience , 2012, User Modeling and User-Adapted Interaction.

[12]  Robert Legenstein,et al.  Improved neighborhood-based algorithms for large-scale recommender systems , 2008, NETFLIX '08.

[13]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[14]  Martin Ester,et al.  A generalized stochastic block model for recommendation in social rating networks , 2011, RecSys '11.

[15]  Modou Gueye,et al.  A cluster-based matrix-factorization for online integration of new ratings , 2011 .

[16]  Chun Chen,et al.  An exploration of improving collaborative recommender systems via user-item subgroups , 2012, WWW.

[17]  John Riedl,et al.  Learning preferences of new users in recommender systems: an information theoretic approach , 2008, SKDD.

[18]  Harald Steck,et al.  Training and testing of recommender systems on data missing not at random , 2010, KDD.

[19]  John Riedl,et al.  An Algorithmic Framework for Performing Collaborative Filtering , 1999, SIGIR Forum.

[20]  Charles X. Ling,et al.  Clustering-based factorized collaborative filtering , 2013, RecSys.

[21]  Yehuda Koren,et al.  Advances in Collaborative Filtering , 2011, Recommender Systems Handbook.

[22]  George Mangalaraj,et al.  Are We Wielding this Hammer Correctly? A Reflective Review of the Application of Cluster Analysis in Information Systems Research , 2011, J. Assoc. Inf. Syst..

[23]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[24]  Ian Witten,et al.  Data Mining , 2000 .