Estimating PageRank on graph streams

This article focuses on computations on large graphs (e.g., the web-graph) where the edges of the graph are presented as a stream. The objective in the streaming model is to use small amount of memory (preferably sub-linear in the number of nodes <i>n</i>) and a smaller number of passes. In the streaming model, we show how to perform several graph computations including estimating the probability distribution after a random walk of length <i>l</i>, the mixing time <i>M</i>, and other related quantities such as the conductance of the graph. By applying our algorithm for computing probability distribution on the web-graph, we can estimate the <i>PageRank</i> <i>p</i> of any node up to an additive error of &sqrt;ε <i>p</i>+ε in <i>Õ</i>(&sqrt;<i>M</i>/α) passes and <i>Õ</i>(min(<i>n</i>α+1/ε&sqrt;<i>M</i>/α+(1/ε)<i>M</i>α, α <i>n</i>&sqrt;<i>M</i>α + (1/ε)&sqrt;<i>M</i>/α)) space, for any α ∈ (0,1]. Specifically, for ε = <i>M</i>/<i>n</i>, α = <i>M</i><sup>−1/2</sup>, we can compute the approximate PageRank values in Õ(<i>nM</i><sup>−1/4</sup>) space and Õ(<i>M</i><sup>3/4</sup>) passes. In comparison, a standard implementation of the PageRank algorithm will take <i>O(n)</i> space and <i>O(M)</i> passes. We also give an approach to approximate the PageRank values in just Õ(1) passes although this requires Õ(<i>nM</i>) space.

[1]  M. Harrison,et al.  Proceedings of the eleventh annual ACM symposium on Theory of computing , 1974, STOC 1975.

[2]  Mark Jerrum,et al.  Approximating the Permanent , 1989, SIAM J. Comput..

[3]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[4]  SL ⊆L4/3 , 1997, STOC 1997.

[5]  Uriel Feige A Spectrum of Time-Space Trade-Offs for Undirected s-t Connectivity , 1997, J. Comput. Syst. Sci..

[6]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[7]  Prabhakar Raghavan,et al.  Computing on data streams , 1999, External Memory Algorithms.

[8]  Bruce G. Lindsay,et al.  Random sampling techniques for space efficient online computation of order statistics of large datasets , 1999, SIGMOD '99.

[9]  Sanjeev Khanna,et al.  Space-efficient online computation of quantile summaries , 2001, SIGMOD '01.

[10]  Ronitt Rubinfeld,et al.  Testing random variables for independence and identity , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[11]  Ziv Bar-Yossef,et al.  Reductions in streaming algorithms, with an application to counting triangles in graphs , 2002, SODA '02.

[12]  David P. Woodruff Optimal space lower bounds for all frequency moments , 2004, SODA '04.

[13]  Piotr Indyk,et al.  Algorithms for dynamic geometric problems over data streams , 2004, STOC '04.

[14]  Graham Cormode,et al.  Space efficient mining of multigraph streams , 2005, PODS.

[15]  Frank McSherry,et al.  A uniform approach to accelerated PageRank computation , 2005, WWW '05.

[16]  David P. Woodruff,et al.  Optimal approximations of the frequency moments of data streams , 2005, STOC '05.

[17]  Andrew McGregor,et al.  Finding Graph Matchings in Data Streams , 2005, APPROX-RANDOM.

[18]  Mohammad Ghodsi,et al.  New Streaming Algorithms for Counting Triangles in Graphs , 2005, COCOON.

[19]  Joan Feigenbaum,et al.  Graph distances in the streaming model: the value of space , 2005, SODA '05.

[20]  Sudipto Guha,et al.  Approximate quantiles and the order of the stream , 2006, PODS.

[21]  Jon Feldman,et al.  On the Complexity of Processing Massive, Unordered, Distributed Data , 2006, ArXiv.

[22]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[23]  Christian Sohler,et al.  Counting triangles in data streams , 2006, PODS.

[24]  Sumit Ganguly,et al.  Estimating Entropy over Data Streams , 2006, ESA.

[25]  Camil Demetrescu,et al.  Trading off space for passes in graph streaming problems , 2009, SODA '06.

[26]  Sudipto Guha,et al.  Streaming and sublinear approximation of entropy and information distances , 2005, SODA '06.

[27]  Sumit Ganguly,et al.  Simpler algorithm for estimating frequency moments of data streams , 2006, SODA '06.

[28]  András A. Benczúr,et al.  To randomize or not to randomize: space optimal summaries for hyperlink analysis , 2006, WWW '06.

[29]  Camil Demetrescu,et al.  Trading off space for passes in graph streaming problems , 2006, SODA 2006.

[30]  Amy Greenwald,et al.  Parallelizing the Computation of PageRank , 2007, WAW.

[31]  Sudipto Guha,et al.  Space-Efficient Sampling , 2007, AISTATS.

[32]  Sudipto Guha,et al.  Lower Bounds for Quantile Estimation in Random-Order and Multi-pass Streaming , 2007, ICALP.