A Survey on Streaming Algorithms for Massive Graphs

Streaming is an important paradigm for handling massive graphs that are too large to fit in the main memory. In the streaming computational model, algorithms are restricted to use much less space than they would need to store the input. Furthermore, the input is accessed in a sequential fashion, therefore, can be viewed as a stream of data elements. The restriction limits the model and yet, algorithms exist for many graph problems in the streaming model. We survey a set of algorithms that compute graph statistics, matching and distance in a graph, and random walks. These are basic graph problems and the algorithms that compute them may be used as building blocks in graph-data management and mining.

[1]  Mariano Zelke,et al.  Weighted Matching in the Semi-Streaming Model , 2007, Algorithmica.

[2]  Joan Feigenbaum,et al.  Graph distances in the streaming model: the value of space , 2005, SODA '05.

[3]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[4]  Sudipto Guha,et al.  Data-streams and histograms , 2001, STOC '01.

[5]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, PODS '03.

[6]  Mariano Zelke,et al.  k-Connectivity in the Semi-Streaming Model , 2006, ArXiv.

[7]  Sreenivas Gollapudi,et al.  Estimating PageRank on graph streams , 2008, PODS.

[8]  Camil Demetrescu,et al.  Trading off space for passes in graph streaming problems , 2009, SODA '06.

[9]  Giri Narasimhan,et al.  Fast algorithms for constructing t-spanners and paths with stretch t , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[10]  Mahesh Viswanathan,et al.  An Approximate L1-Difference Algorithm for Massive Data Streams , 2002, SIAM J. Comput..

[11]  Michael Elkin,et al.  Efficient algorithms for constructing (1+,ε, β)-spanners in the distributed and streaming models , 2004, PODC '04.

[12]  Sudipto Guha,et al.  Fast, small-space algorithms for approximate histogram maintenance , 2002, STOC '02.

[13]  Jeffrey D. Ullman,et al.  Some Results on Tape-Bounded Turing Machines , 1969, JACM.

[14]  Piotr Indyk,et al.  Algorithms for dynamic geometric problems over data streams , 2004, STOC '04.

[15]  Christian Sohler,et al.  Counting triangles in data streams , 2006, PODS.

[16]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[17]  Prabhakar Raghavan,et al.  Computing on data streams , 1999, External Memory Algorithms.

[18]  Mayur Datar,et al.  On the streaming model augmented with a sorting primitive , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[19]  Graham Cormode,et al.  A near-optimal algorithm for computing the entropy of a stream , 2007, SODA '07.

[20]  Juris Hartmanis On the Complexity of Undecidable Problems in Automata Theory , 1969, JACM.

[21]  Jeffrey Scott Vitter,et al.  External memory algorithms and data structures: dealing with massive data , 2001, CSUR.

[22]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[23]  Piotr Indyk,et al.  Stable distributions, pseudorandom generators, embeddings, and data stream computation , 2006, JACM.

[24]  Graham Cormode,et al.  Space efficient mining of multigraph streams , 2005, PODS.

[25]  Camil Demetrescu,et al.  Trading off space for passes in graph streaming problems , 2006, SODA 2006.

[26]  Philippe Flajolet,et al.  Probabilistic counting , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[27]  Shang-Hua Teng,et al.  Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems , 2003, STOC '04.

[28]  Michael Elkin,et al.  Computing almost shortest paths , 2001, TALG.

[29]  S. Muthukrishnan,et al.  Rangesum histograms , 2003, SODA '03.

[30]  J. Ian Munro,et al.  Selection and sorting with limited storage , 1978, 19th Annual Symposium on Foundations of Computer Science (sfcs 1978).

[31]  David Peleg,et al.  An optimal synchronizer for the hypercube , 1987, PODC '87.

[32]  Ziv Bar-Yossef,et al.  Reductions in streaming algorithms, with an application to counting triangles in graphs , 2002, SODA '02.

[33]  Mohammad Ghodsi,et al.  New Streaming Algorithms for Counting Triangles in Graphs , 2005, COCOON.

[34]  David P. Dobkin,et al.  Generating Sparse Spanners for Weighted Graphs , 1990, SWAT.

[35]  Sudipto Guha,et al.  Clustering Data Streams , 2000, FOCS.

[36]  Michael Elkin,et al.  Streaming and fully dynamic centralized algorithms for constructing and maintaining sparse spanners , 2007, TALG.

[37]  B. Bollobás,et al.  Extremal Graph Theory , 2013 .

[38]  Andrew McGregor,et al.  Finding Graph Matchings in Data Streams , 2005, APPROX-RANDOM.

[39]  Petros Drineas,et al.  Pass efficient algorithms for approximating large matrices , 2003, SODA '03.

[40]  Ronald D. Dutton,et al.  Edges in graphs with large girth , 1991, Graphs Comb..

[41]  Miklós Simonovits,et al.  The mixing rate of Markov chains, an isoperimetric inequality, and computing the volume , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[42]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[43]  Sreenivas Gollapudi,et al.  Sparse Cut Projections in Graph Streams , 2009, ESA.

[44]  Noga Alon,et al.  The Moore Bound for Irregular Graphs , 2002, Graphs Comb..

[45]  Lenore Cowen,et al.  Near-Linear Time Construction of Sparse Neighborhood Covers , 1999, SIAM J. Comput..