An efficient algorithm for density-balanced partitioning in distributed pagerank

Google's PageRank is the most notable approach for web search ranking. In general, web pages are represented by web-link graph; a web-page is represented by a node, and a link between two pages is represented by an edge. In particular, it is not efficient to perform PageRank of a large web-link graph in a single computer. Distributed systems, such as P2P, are viable choices to address such limitation. In P2P-based PageRank, each computational peer contains a partial web-link graph, i.e., a sub-graph of the global web-link graph, and its PageRank is computed locally. The convergence time of a PageRank calculation is affected by the web-link graph density, i.e., the ratio of the number of edges to the number of nodes, such that if a web-link graph has high density, it will take longer time to converge. As the execution time to compute the P2P-based web ranking is influenced by the execution time of the slowest peer to compute the local ranking, the density-balanced local web-link graph partitioning can be highly desirable. This paper addresses a density-balanced partitioning problem and proposes an efficient algorithm for the problem. The experiment results show that the proposed algorithm can effectively partition graph into density-balanced sub with an acceptable cost.

[1]  Jaideep Srivastava,et al.  Hyperlink Analysis: Techniques and Applications , 2005 .

[2]  Gerhard Weikum,et al.  JXP: Global Authority Scores in a P2P Network , 2005, WebDB.

[3]  Pruet Boonma,et al.  A P2P-Based Incremental Web Ranking Algorithm , 2011, 2011 International Conference on P2P, Parallel, Grid, Cloud and Internet Computing.

[4]  Joseph Naor,et al.  Fast approximate graph partitioning algorithms , 1997, SODA '97.

[5]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[6]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[7]  Konstantin Andreev,et al.  Balanced Graph Partitioning , 2004, SPAA '04.

[8]  James C. Browne,et al.  Distributed pagerank for P2P systems , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[9]  Carl D. Meyer,et al.  Deeper Inside PageRank , 2004, Internet Math..

[10]  Guangwen Yang,et al.  Distributed page ranking in structured P2P networks , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[11]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.