Scaling-Up Item-Based Collaborative Filtering Recommendation Algorithm Based on Hadoop

Collaborative filtering (CF) techniques have achieved widespread success in E-commerce nowadays. The tremendous growth of the number of customers and products in recent years poses some key challenges for recommender systems in which high quality recommendations are required and more recommendations per second for millions of customers and products need to be performed. Thus, the improvement of scalability and efficiency of collaborative filtering (CF) algorithms become increasingly important and difficult. In this paper, we developed and implemented a scaling-up item-based collaborative filtering algorithm on MapReduce, by splitting the three most costly computations in the proposed algorithm into four Map-Reduce phases, each of which can be independently executed on different nodes in parallel. We also proposed efficient partition strategies not only to enable the parallel computation in each Map-Reduce phase but also to maximize data locality to minimize the communication cost. Experimental results effectively showed the good performance in scalability and efficiency of the item-based CF algorithm on a Hadoop cluster.

[1]  Geoffrey C. Fox,et al.  MapReduce for Data Intensive Scientific Analyses , 2008, 2008 IEEE Fourth International Conference on eScience.

[2]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[3]  Taghi M. Khoshgoftaar,et al.  Imputed Neighborhood Based Collaborative Filtering , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[4]  SongJie Gong Joining Case-Based Reasoning and Item-Based Collaborative Filtering in Recommender Systems , 2009, 2009 Second International Symposium on Electronic Commerce and Security.

[5]  Hai Jin,et al.  Evaluating MapReduce on Virtual Machines: The Hadoop Case , 2009, CloudCom.

[6]  HongWu Ye,et al.  A Collaborative Filtering Recommendation Algorithm Based on Item Classification , 2009, 2009 Pacific-Asia Conference on Circuits, Communications and Systems.

[7]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[8]  Milind A. Bhandarkar,et al.  MapReduce programming with apache Hadoop , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[9]  Qing He,et al.  Parallel K-Means Clustering Based on MapReduce , 2009, CloudCom.

[10]  SongJie Gong,et al.  Combining Memory-Based and Model-Based Collaborative Filtering in Recommender System , 2009, 2009 Pacific-Asia Conference on Circuits, Communications and Systems.

[11]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[12]  Vipin Kumar,et al.  Scalability of Parallel Algorithms for the All-Pairs Shortest-Path Problem , 1991, J. Parallel Distributed Comput..

[13]  Ping Su,et al.  An Item Based Collaborative Filtering Recommendation Algorithm Using Rough Set Prediction , 2009, 2009 International Joint Conference on Artificial Intelligence.

[14]  Pu Wang,et al.  A Personalized Recommendation Algorithm Combining Slope One Scheme and User Based Collaborative Filtering , 2009, 2009 International Conference on Industrial and Information Systems.

[15]  Yong Yan,et al.  Measuring and Analyzing Parallel Computing Scalability , 1994, ICPP.

[16]  Anand Sivasubramaniam,et al.  Issues in Understanding the Scalability of Parallel Systems , 1994 .

[17]  Chin-Feng Lai,et al.  CPRS: A cloud-based program recommendation system for digital TV platforms , 2010, Future Gener. Comput. Syst..

[18]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[19]  Zhi-Dan Zhao,et al.  User-Based Collaborative-Filtering Recommendation Algorithms on Hadoop , 2010, 2010 Third International Conference on Knowledge Discovery and Data Mining.

[20]  Yehuda Koren,et al.  Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[21]  Bradley N. Miller,et al.  MovieLens unplugged: experiences with an occasionally connected recommender system , 2003, IUI '03.

[22]  Christoforos E. Kozyrakis,et al.  On the energy (in)efficiency of Hadoop clusters , 2010, OPSR.