Graph distances in the streaming model: the value of space

We investigate the importance of space when solving problems based on graph distance in the streaming model. In this model, the input graph is presented as a stream of edges in an arbitrary order. The main computational restriction of the model is that we have limited space and therefore cannot store all the streamed data; we are forced to make space-efficient summaries of the data as we go along. For a graph of <i>n</i> vertices and <i>m</i> edges, we show that testing many graph properties, including connectivity (<i>ergo</i> any reasonable decision problem about distances) and bipartiteness, requires Ω(<i>n</i>) bits of space. Given this, we then investigate how the power of the model increases as we relax our space restriction. Our main result is an efficient randomized algorithm that constructs a (2<i>t</i> + 1)-spanner in one pass. With high probability, it uses <i>O(t .n</i><sup>1+1/<i>t</i></sup> log<sup>2</sup><i>n</i>) bits of space and processes each edge in the stream in <i>O(t</i><sup>2</sup>·<i>n</i><sup>1/<i>t</i></sup> log <i>n</i>) time. We find approximations to diameter and girth via the constructed spanner. For <i>t</i> = Ω(log <i>n</i>/log log <i>n</i>), the space requirement of the algorithm is <i>O(n</i> .polylog <i>n</i>), and the per-edge processing time is <i>O</i>(polylog <i>n</i>). We also show a corresponding lower bound of <i>t</i> for the approximation ratio achievable when the space restriction is <i>O(t.n</i><sup>1+1/<i>t</i></sup> log<sup>2</sup><i>n</i>).We then consider the scenario in which we are allowed multiple passes over the input stream. Here, we investigate whether allowing these extra passes will compensate for a given space restriction. We show that finding vertices at distance <i>d</i> from a particular vertex will always take <i>d</i> passes, for all <i>d</i> ∈ {1,...,<i>t</i>/2}, when the space restriction is <i>o</i>(<i>n</i><sup>1+1/<i>t</i></sup>). For girth, we show the existence of a direct trade-off between space and passes in the form of a lower bound on the product of the space requirement and number of passes. Finally, we conclude with two general techniques for speeding up the per-edge computation time of streaming algorithms while increasing the space by at most a log factor.

[1]  Sudipto Guha,et al.  Clustering Data Streams , 2000, FOCS.

[2]  Mahesh Viswanathan,et al.  An Approximate L1-Difference Algorithm for Massive Data Streams , 2002, SIAM J. Comput..

[3]  Sudipto Guha,et al.  Data-streams and histograms , 2001, STOC '01.

[4]  Michael Elkin,et al.  Efficient algorithms for constructing (1+,ε, β)-spanners in the distributed and streaming models , 2004, PODC '04.

[5]  Michael Elkin,et al.  Computing almost shortest paths , 2001, TALG.

[6]  David Eppstein,et al.  Sparsification—a technique for speeding up dynamic graph algorithms , 1997, JACM.

[7]  Prabhakar Raghavan,et al.  Computing on data streams , 1999, External Memory Algorithms.

[8]  J. Ian Munro,et al.  Selection and sorting with limited storage , 1978, 19th Annual Symposium on Foundations of Computer Science (sfcs 1978).

[9]  Noam Nisan,et al.  Rounds in communication complexity revisited , 1991, STOC '91.

[10]  Piotr Indyk,et al.  Fast estimation of diameter and shortest paths (without matrix multiplication) , 1996, SODA '96.

[11]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[12]  David Eppstein,et al.  Sparsification-a technique for speeding up dynamic graph algorithms , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[13]  Lenore Cowen,et al.  Near-Linear Time Construction of Sparse Neighborhood Covers , 1999, SIAM J. Comput..

[14]  Sandeep Sen,et al.  A Simple Linear Time Algorithm for Computing a (2k-1)-Spanner of O(n1+1/k) Size in Weighted Graphs , 2003, ICALP.

[15]  Piotr Indyk,et al.  Stable distributions, pseudorandom generators, embeddings, and data stream computation , 2006, JACM.

[16]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[17]  S. Muthukrishnan,et al.  Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries , 2001, VLDB.

[18]  Jaikumar Radhakrishnan,et al.  The Communication Complexity of Pointer Chasing , 2001, J. Comput. Syst. Sci..

[19]  Petros Drineas,et al.  Pass efficient algorithms for approximating large matrices , 2003, SODA '03.

[20]  Jirí Sgall,et al.  Some bounds on multiparty communication complexity of pointer jumping , 1998, computational complexity.

[21]  Mikkel Thorup,et al.  Approximate distance oracles , 2001, JACM.

[22]  Sudipto Guha,et al.  Fast, small-space algorithms for approximate histogram maintenance , 2002, STOC '02.

[23]  F. Lazebnik,et al.  A new series of dense graphs of high girth , 1995, math/9501231.

[24]  Ziv Bar-Yossef,et al.  Reductions in streaming algorithms, with an application to counting triangles in graphs , 2002, SODA '02.