Graph Distances in the DataStream Model

We explore problems related to computing graph distances in the data-stream model. The goal is to design algorithms that can process the edges of a graph in an arbitrary order given only a limited amount of working memory. We are motivated by both the practical challenge of processing massive graphs such as the web graph and the desire for a better theoretical understanding of the data-stream model. In particular, we are interested in the trade-offs between model parameters such as per-data-item processing time, total space, and the number of passes that may be taken over the stream. These trade-offs are more apparent when considering graph problems than they were in previous streaming work that solved problems of a statistical nature. Our results include the following: (1) Spanner construction: There exists a single-pass, (O) over tilde (tn(1+1/ t))-space, (O) over tilde (t(2)n(1/t))-time-per-edge algorithm that constructs a (2t + 1)-spanner. For t = Omega(log n/log log n), the algorithm satisfies the semistreaming space restriction of O(n polylog n) and has per-edge processing time O(polylog n). This resolves an open question from [ J. Feigenbaum et al., Theoret. Comput. Sci., 348 (2005), pp. 207-216]. (2) Breadth-first-search (BFS) trees: For any even constant k, we show that any algorithm that computes the first k layers of a BFS tree from a prescribed node with probability at least 2/3 requires either greater than k/2 passes or Omega(n(1+1/k)) space. Since constructing BFS trees is an important subroutine in many traditional graph algorithms, this demonstrates the need for new algorithmic techniques when processing graphs in the data-stream model. (3) Graph-distance lower bounds: Any t-approximation of the distance between two nodes requires Omega(n(1+1/t)) space. We also prove lower bounds for determining the length of the shortest cycle and other graph properties. (4) Techniques for decreasing per-edge processing: We discuss two general techniques for speeding up the per-edge computation time of streaming algorithms while increasing the space by only a small factor.

[1]  Bala Kalyanasundaram,et al.  The Probabilistic Communication Complexity of Set Intersection , 1992, SIAM J. Discret. Math..

[2]  Noam Nisan,et al.  Rounds in communication complexity revisited , 1991, STOC '91.

[3]  Alexander A. Razborov,et al.  On the Distributional Complexity of Disjointness , 1992, Theor. Comput. Sci..

[4]  David Eppstein,et al.  Sparsification-a technique for speeding up dynamic graph algorithms , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[5]  Giri Narasimhan,et al.  Fast algorithms for constructing t-spanners and paths with stretch t , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[6]  F. Lazebnik,et al.  A new series of dense graphs of high girth , 1995, math/9501231.

[7]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[8]  David Eppstein,et al.  Sparsification—a technique for speeding up dynamic graph algorithms , 1997, JACM.

[9]  Lenore Cowen,et al.  Near-Linear Time Construction of Sparse Neighborhood Covers , 1999, SIAM J. Comput..

[10]  Ravi Kumar,et al.  Extracting Large-Scale Knowledge Bases from the Web , 1999, VLDB.

[11]  Sanjeev Khanna,et al.  Space-efficient online computation of quantile summaries , 2001, SIGMOD '01.

[12]  Jeffery R. Westbrook,et al.  A Functional Approach to External Graph Algorithms , 1998, Algorithmica.

[13]  Michael Elkin,et al.  Computing almost shortest paths , 2001, TALG.

[14]  Mikkel Thorup,et al.  Approximate distance oracles , 2001, JACM.

[15]  Ziv Bar-Yossef,et al.  Reductions in streaming algorithms, with an application to counting triangles in graphs , 2002, SODA '02.

[16]  Mahesh Viswanathan,et al.  An Approximate L1-Difference Algorithm for Massive Data Streams , 2002, SIAM J. Comput..

[17]  Sudipto Guha,et al.  Fast, small-space algorithms for approximate histogram maintenance , 2002, STOC '02.

[18]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[19]  Sandeep Sen,et al.  A Simple Linear Time Algorithm for Computing a (2k-1)-Spanner of O(n1+1/k) Size in Weighted Graphs , 2003, ICALP.

[20]  Jaikumar Radhakrishnan,et al.  A Direct Sum Theorem in Communication Complexity via Message Compression , 2003, ICALP.

[21]  Raffaele Giancarlo,et al.  On finding common neighborhoods in massive graphs , 2003, Theor. Comput. Sci..

[22]  Mayur Datar,et al.  On the streaming model augmented with a sorting primitive , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[23]  Michael Elkin,et al.  Efficient algorithms for constructing (1+,ε, β)-spanners in the distributed and streaming models , 2004, PODC '04.

[24]  Graham Cormode,et al.  Space efficient mining of multigraph streams , 2005, PODS.

[25]  David P. Woodruff,et al.  Optimal approximations of the frequency moments of data streams , 2005, STOC '05.

[26]  Andrew McGregor,et al.  Finding Graph Matchings in Data Streams , 2005, APPROX-RANDOM.

[27]  Mohammad Ghodsi,et al.  New Streaming Algorithms for Counting Triangles in Graphs , 2005, COCOON.

[28]  Joan Feigenbaum,et al.  Graph distances in the streaming model: the value of space , 2005, SODA '05.

[29]  Piotr Indyk,et al.  Stable distributions, pseudorandom generators, embeddings, and data stream computation , 2006, JACM.

[30]  Christian Sohler,et al.  Counting triangles in data streams , 2006, PODS.

[31]  Mariano Zelke,et al.  k-Connectivity in the Semi-Streaming Model , 2006, ArXiv.

[32]  Camil Demetrescu,et al.  Trading off space for passes in graph streaming problems , 2009, SODA '06.

[33]  Sudipto Guha,et al.  Approximation and streaming algorithms for histogram construction problems , 2006, TODS.

[34]  Mariano Zelke,et al.  Optimal per-edge processing times in the semi-streaming model , 2007, Inf. Process. Lett..

[35]  Graham Cormode,et al.  A near-optimal algorithm for computing the entropy of a stream , 2007, SODA '07.

[36]  Surender Baswana,et al.  Streaming algorithm for graph spanners - single pass and constant processing time per edge , 2008, Inf. Process. Lett..

[37]  A. Razborov Communication Complexity , 2011 .

[38]  Michael Elkin,et al.  Streaming and fully dynamic centralized algorithms for constructing and maintaining sparse spanners , 2007, TALG.

[39]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[40]  A. Kemper,et al.  On Graph Problems in a Semi-streaming Model , 2015 .