User-Based Collaborative-Filtering Recommendation Algorithms on Hadoop

Collaborative Filtering(CF) algorithms are widely used in a lot of recommender systems, however, the computational complexity of CF is high thus hinder their use in large scale systems. In this paper, we implement user-based CF algorithm on a cloud computing platform, namely Hadoop, to solve the scalability problem of CF. Experimental results show that a simple method that partition users into groups according to two basic principles, i.e., tidy arrangement of mapper number to overcome the initiation of mapper and partition task equally such that all processors finish task at the same time, can achieve linear speedup.

[1]  GhemawatSanjay,et al.  The Google file system , 2003 .

[2]  Tao Yang,et al.  Optimizing data aggregation for cluster-based internet services , 2003, PPoPP '03.

[3]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.

[4]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[6]  Sanjay Ghemawat,et al.  Distributed Programming with MapReduce , 2007 .

[7]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[8]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[9]  Abhinandan Das,et al.  Google news personalization: scalable online collaborative filtering , 2007, WWW '07.

[10]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[11]  Shang Mingsheng,et al.  Diffusion-Based Recommendation in Collaborative Tagging Systems , 2009 .

[12]  Carsten Griwodz,et al.  Kahn Process Networks are a Flexible Alternative to MapReduce , 2009, 2009 11th IEEE International Conference on High Performance Computing and Communications.

[13]  Yi-Cheng Zhang,et al.  Collaborative filtering based on multi-channel diffusion , 2009, ArXiv.

[14]  Omkhar Arasaratnam,et al.  Introduction to Cloud Computing , 2011 .