论文信息 - MapReduce-Based SimRank Computation and Its Application in Social Recommender System

MapReduce-Based SimRank Computation and Its Application in Social Recommender System

Recently there has been a lot of interest in graph-based analysis, with examples including social network analysis, recommendation systems, document classification and clustering, and so on. A graph is an abstraction that naturally captures data objects as well as relationships among those objects. Objects are represented as nodes and relationships are represented as edges in the graph. There are many cases in which similarities among nodes are required to compute. SimRank is one of the simple and intuitive algorithms for this purpose. It is rigidly based on the random walk theorem. Existing methods on SimRank computation suffer from one limitation: the computing cost can be very high in practice. In order to optimize the computation of SimRank, a few techniques have been proposed. However, the performance of these methods are still limited by the processing ability of the single computer. Ideally, we would like to develop new parallel solutions that can offer improved processing power to compute SimRank on large data set. In this paper, we propose parallel algorithms for SimRank computation on Map-Reduce framework, and more specifically its open source implementation, Hadoop. Two different parallel methods are proposed and their performances are evaluated and compared. Furthermore, we employ the proposed methods to do the similarity computation in order to recommend appropriate products to users in social recommender systems.

[1] M. Newman,et al. Vertex similarity in networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2] Dániel Fogaras,et al. Scaling link-based similarity search , 2005, WWW '05.

[3] Yizhou Sun,et al. Fast computation of SimRank for static and dynamic information networks , 2010, EDBT '10.

[4] Philip S. Yu,et al. Proximity Tracking on Time-Evolving Bipartite Graphs , 2008, SDM.

[5] Michael D. Ernst,et al. HaLoop , 2010, Proc. VLDB Endow..

[6] Filippo Menczer,et al. Algorithmic Computation and Approximation of Semantic Similarity , 2006, World Wide Web.

[7] Hongyan Liu,et al. Fast Single-Pair SimRank Computation , 2010, SDM.

[8] Yun Chi,et al. Evolutionary spectral clustering by incorporating temporal smoothness , 2007, KDD '07.

[9] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[10] M E J Newman,et al. Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[11] Michael Garland,et al. Eﬃcient Sparse Matrix-Vector Multiplication on CUDA , 2008 .

[12] Jiawei Han,et al. CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[13] Yehuda Koren,et al. Measuring and extracting proximity in networks , 2006, KDD '06.

[14] Yanfeng Zhang,et al. iMapReduce: A Distributed Computing Framework for Iterative Computation , 2011, Journal of Grid Computing.

[15] Jimmy J. Lin,et al. Design patterns for efficient graph algorithms in MapReduce , 2010, MLG '10.

[16] Sandeep Tata,et al. Clydesdale: structured data processing on MapReduce , 2012, EDBT '12.

[17] Mark E. J. Newman,et al. The Structure and Function of Complex Networks , 2003, SIAM Rev..

[18] Philip S. Yu,et al. Substructure similarity search in graph databases , 2005, SIGMOD '05.

[19] Edward A. Fox,et al. SimFusion: measuring similarity using unified relationship matrix , 2005, SIGIR '05.

[20] Ravi Kumar,et al. Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[21] Tanya Y. Berger-Wolf,et al. A framework for community identification in dynamic social networks , 2007, KDD '07.

[22] Jennifer Widom,et al. Exploiting hierarchical domain structure to compute similarity , 2003, TOIS.