A collaborative filtering recommendation engine in a distributed environment

The tremendous increase in information available over the Internet has created a challenge in searching of useful information, therefore intelligent approaches are needed to provide users to efficiently locate and retrieve information from the Web. In recent times recommender systems, recommend everything from movies, books, music, restaurant, news to jokes. Collaborative filtering (CF) algorithms are one of the most successful recommendation techniques which present information on items and products that are according to user's interest. There are two methods in CF, user-based CF and item-based CF. Former finds a certain user's interests by finding other users who have similar interests whereas item based CF looks into a set of items rated by all users and computes how similar they are to the target item under recommendation. This paper aims to develop a model by splitting the costly computations in CF algorithms into three Map-Reduce phases. Further, each of these phases can be executed independently on different nodes in parallel. To compute the similarity, the Pearson correlation algorithm is used, which measures the how two items linearly relate to each other, giving a value between -1 and +1 inclusive. In addition, this paper compares the implementation of item based and user based CF algorithm on map-reduce framework. Experimental results showed that the running time of the algorithm improves by approximately 30% with every addition of a node, into a Hadoop cluster. However, item-based CF showed better scalability than user-based CF.

[1]  Martha Larson,et al.  Collaborative Filtering beyond the User-Item Matrix , 2014, ACM Comput. Surv..

[2]  Juan-Zi Li,et al.  Typicality-Based Collaborative Filtering Recommendation , 2014, IEEE Transactions on Knowledge and Data Engineering.

[3]  Jun Wang,et al.  Unifying user-based and item-based collaborative filtering approaches by similarity fusion , 2006, SIGIR.

[4]  Alejandro Bellogín,et al.  Neighbor Selection and Weighting in User-Based Collaborative Filtering: A Performance Prediction Approach , 2014, TWEB.

[5]  Qiang Yang,et al.  Scalable collaborative filtering using cluster-based smoothing , 2005, SIGIR '05.

[6]  Jie Lu,et al.  Scaling-Up Item-Based Collaborative Filtering Recommendation Algorithm Based on Hadoop , 2011, 2011 IEEE World Congress on Services.

[7]  Zhi-Dan Zhao,et al.  User-Based Collaborative-Filtering Recommendation Algorithms on Hadoop , 2010, 2010 Third International Conference on Knowledge Discovery and Data Mining.

[8]  Taghi M. Khoshgoftaar,et al.  A Survey of Collaborative Filtering Techniques , 2009, Adv. Artif. Intell..

[9]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[10]  Michael J. Pazzani,et al.  A Framework for Collaborative, Content-Based and Demographic Filtering , 1999, Artificial Intelligence Review.

[11]  Michael J. Pazzani,et al.  Learning and Revising User Profiles: The Identification of Interesting Web Sites , 1997, Machine Learning.

[12]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[13]  Jinjun Chen,et al.  KASR: A Keyword-Aware Service Recommendation Method on MapReduce for Big Data Applications , 2014, IEEE Transactions on Parallel and Distributed Systems.