An efficient parallel similarity matrix construction on MapReduce for collaborative filtering

Nowadays, the collaborative filtering becomes popular for recommendation systems. However, as the volume of data increases expansively, the construction of a similarity matrix becomes a performance bottleneck in recommendation systems. The MapReduce framework proposed by Google has been widely used for data-intensive application recently. Thus, in this work, we propose an efficient parallel algorithm ConSimMR for constructing a similarity matrix using MapReduce. We first partition a set of items into disjoint groups in each of which items rated by similar users tend to be located. We next compute the similarity of every pair of items belonging to the same group. Finally, we calculate the similarity of every item pair included in different groups. At this step, by using the rating list of each user rather than that of each item, we can compute the similarities in parallel resulting in the performance improvement. We conducted experiments to compare our parallel algorithm ConSimMR with the previous algorithms on real-life data sets and confirmed the efficiency as well as scalability of ConSimMR.

[1]  Volker Markl,et al.  Scalable similarity-based neighborhood methods with MapReduce , 2012, RecSys.

[2]  Bradley N. Miller,et al.  MovieLens unplugged: experiences with an occasionally connected recommender system , 2003, IUI '03.

[3]  Edith Cohen,et al.  Size-Estimation Framework with Applications to Transitive Closure and Reachability , 1997, J. Comput. Syst. Sci..

[4]  Chenyang Li,et al.  CBMR: An optimized MapReduce for item‐based collaborative filtering recommendation algorithm with empirical analysis , 2017, Concurr. Comput. Pract. Exp..

[5]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[6]  Jie Lu,et al.  Scaling-Up Item-Based Collaborative Filtering Recommendation Algorithm Based on Hadoop , 2011, 2011 IEEE World Congress on Services.

[7]  Douglas B. Terry,et al.  Using collaborative filtering to weave an information tapestry , 1992, CACM.

[8]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[9]  Abhinandan Das,et al.  Google news personalization: scalable online collaborative filtering , 2007, WWW '07.

[10]  George Karypis,et al.  Item-based top-N recommendation algorithms , 2004, TOIS.

[11]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[12]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[13]  Piotr Indyk,et al.  A small approximately min-wise independent family of hash functions , 1999, SODA '99.

[14]  Zhi-Dan Zhao,et al.  User-Based Collaborative-Filtering Recommendation Algorithms on Hadoop , 2010, 2010 Third International Conference on Knowledge Discovery and Data Mining.

[15]  Taghi M. Khoshgoftaar,et al.  A Survey of Collaborative Filtering Techniques , 2009, Adv. Artif. Intell..

[16]  Jinjun Chen,et al.  KASR: A Keyword-Aware Service Recommendation Method on MapReduce for Big Data Applications , 2014, IEEE Transactions on Parallel and Distributed Systems.

[17]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[18]  Pu Wang,et al.  A Personalized Recommendation Algorithm Combining Slope One Scheme and User Based Collaborative Filtering , 2009, 2009 International Conference on Industrial and Information Systems.

[19]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[20]  Naohiro Ishii,et al.  Memory-Based Weighted-Majority Prediction for Recommender Systems , 1999, SIGIR 1999.

[21]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[22]  Bradley N. Miller,et al.  Social Information Filtering : Algorithms for Automating “ Word of Mouth , ” , 2017 .