论文信息 - Parallelizing the Computation of PageRank

Parallelizing the Computation of PageRank

This paper presents a technique we call ParaSolve that exploits the sparsity structure of the web graph matrix to improve on the degree of parallelism in a number of distributed approaches for computating PageRank. Specifically, a typical algorithm (such as power iteration or GMRES) for solving the linear system corresponding to PageRank, call it LinearSolve, may be converted to a distributed algorithm, Distrib( LinearSolve), by partitioning the problem and applying a standard technique (i.e., Distrib). By reducing the number of inter-partition multiplications, we may greatly increase the degree of parallelism, while achieving a similar degree of accuracy. This should lead to increasingly better performance as we utilize more processors. For example, using GeoSolve (a variant of Jacobi iteration) as our linear solver and the 2001 web graph from Stanford's WebBase project, on 12 processors Para-Solve(GeoSolve) outperforms Distrib(GeoSolve) by a factor of 1.4, while on 32 processors the performance ratio improves to 2.8.

Amy Greenwald | John R. Wicks

[1] Frank McSherry,et al. A uniform approach to accelerated PageRank computation , 2005, WWW '05.

[2] Gene H. Golub,et al. Exploiting the Block Structure of the Web for Computing , 2003 .

[3] Wolfgang Nejdl,et al. Efficient Parallel Computation of PageRank , 2006, ECIR.

[4] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[5] David F. Gleich,et al. Fast Parallel PageRank: A Linear System Approach , 2004 .