论文信息 - A partitioning technique for improving the performance of PageRank on Hadoop

A partitioning technique for improving the performance of PageRank on Hadoop

There are a lot of research results in large scale graph analysis on Hadoop. The performance of the graph analysis based on Hadoop is impacted by data partitioning. The effectiveness of data partitioning depends on how the data partitioning maintains data locality in each node of cluster, and this would be different from the problems faced with. One way of data partitioning known to be effective is partitioning data by domains. For instance, this technique could be very useful in partitioning data by areas analyzing web graphs. But this kind of improvement from the data partitioning is limited to specific problems. In this paper, we propose a data partitioning technique based on semi-clustering for analyzing web graphs with PageRank algorithm on Hadoop. With experiment, PageRank computation with our partitioning technique improves the performance, as the number of iterations increases. This method can be very effective in the case of large scale graph processing.

[1] Konstantin Avrachenkov,et al. Pagerank based clustering of hypertext document collections , 2008, SIGIR '08.

[2] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[3] Christos Faloutsos,et al. Graph mining: Laws, generators, and algorithms , 2006, CSUR.

[4] Jennifer Widom,et al. GPS: a graph processing system , 2013, SSDBM.

[5] Jimmy J. Lin,et al. Design patterns for efficient graph algorithms in MapReduce , 2010, MLG '10.

[6] Padhraic Smyth,et al. Algorithms for estimating relative importance in networks , 2003, KDD '03.

[7] Vince Grolmusz,et al. When the Web meets the cell: using personalized PageRank for analyzing protein interaction networks , 2011, Bioinform..

[8] David Cunningham,et al. M3R: Increased performance for in-memory Hadoop jobs , 2012, Proc. VLDB Endow..

[9] Michael D. Ernst,et al. HaLoop , 2010, Proc. VLDB Endow..

[10] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.