Graph stream algorithms: a survey

Over the last decade, there has been considerable interest in designing algorithms for processing massive graphs in the data stream model. The original motivation was two-fold: a) in many applications, the dynamic graphs that arise are too large to be stored in the main memory of a single machine and b) considering graph problems yields new insights into the complexity of stream computation. However, the techniques developed in this area are now finding applications in other areas including data structures for dynamic graphs, approximation algorithms, and distributed and parallel computation. We survey the state-of-the-art results; identify general techniques; and highlight some simple algorithms that illustrate basic ideas.

[1]  B. Bollobás,et al.  Extremal Graph Theory , 2013 .

[2]  Robert E. Tarjan,et al.  Data structures and network algorithms , 1983, CBMS-NSF regional conference series in applied mathematics.

[3]  David R. Karger,et al.  Approximating s – t Minimum Cuts in ~ O(n 2 ) Time , 2007 .

[4]  Prabhakar Raghavan,et al.  Computing on data streams , 1999, External Memory Algorithms.

[5]  David R. Karger,et al.  Random Sampling in Cut, Flow, and Network Design Problems , 1999, Math. Oper. Res..

[6]  Ziv Bar-Yossef,et al.  Reductions in streaming algorithms, with an application to counting triangles in graphs , 2002, SODA '02.

[7]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[8]  Graham Cormode,et al.  Space efficient mining of multigraph streams , 2005, PODS.

[9]  Andrew McGregor,et al.  Finding Graph Matchings in Data Streams , 2005, APPROX-RANDOM.

[10]  Mohammad Ghodsi,et al.  New Streaming Algorithms for Counting Triangles in Graphs , 2005, COCOON.

[11]  Joan Feigenbaum,et al.  On graph problems in a semi-streaming model , 2005, Theor. Comput. Sci..

[12]  Michael Elkin,et al.  Efficient algorithms for constructing (1+∊,β)-spanners in the distributed and streaming models , 2006, Distributed Computing.

[13]  Christian Sohler,et al.  Counting triangles in data streams , 2006, PODS.

[14]  R. Ostrovsky,et al.  Smooth Histograms for Sliding Windows , 2007, FOCS 2007.

[15]  Surender Baswana,et al.  Streaming algorithm for graph spanners - single pass and constant processing time per edge , 2008, Inf. Process. Lett..

[16]  Joan Feigenbaum,et al.  Graph Distances in the Data-Stream Model , 2008, SIAM J. Comput..

[17]  Sudipto Guha,et al.  Graph Sparsification in the Semi-streaming Model , 2009, ICALP.

[18]  Richard J. Lipton,et al.  Best-order streaming model , 2009, Theor. Comput. Sci..

[19]  Nikhil Srivastava,et al.  Twice-ramanujan sparsifiers , 2008, STOC '09.

[20]  Oded Goldreich,et al.  Introduction to Testing Graph Properties , 2010, Property Testing.

[21]  Luca Becchetti,et al.  Efficient algorithms for large-scale local triangle counting , 2010, TKDD.

[22]  Mariano Zelke,et al.  Weighted Matching in the Semi-Streaming Model , 2007, Algorithmica.

[23]  Magnús M. Halldórsson,et al.  Streaming Algorithms for Independent Sets , 2010, ICALP.

[24]  Leah Epstein,et al.  Improved Approximation Guarantees for Weighted Matching in the Semi-streaming Model , 2009, SIAM J. Discret. Math..

[25]  Hossein Jowhari,et al.  Tight bounds for Lp samplers, finding duplicates in streams, and related problems , 2010, PODS.

[26]  Shang-Hua Teng,et al.  Spectral Sparsification of Graphs , 2008, SIAM J. Comput..

[27]  Sudipto Guha,et al.  Linear programming in the semi-streaming model with application to the maximum matching problem , 2011, Inf. Comput..

[28]  Ashwinkumar Badanidiyuru Varadaraja Buyback problem: approximate matroid intersection with cancellation costs , 2011, ICALP 2011.

[29]  Sreenivas Gollapudi,et al.  Estimating PageRank on graph streams , 2008, PODS.

[30]  Kurt Mehlhorn,et al.  Approximate Counting of Cycles in Streams , 2011, ESA.

[31]  Michael Elkin,et al.  Streaming and fully dynamic centralized algorithms for constructing and maintaining sparse spanners , 2007, TALG.

[32]  Debmalya Panigrahi,et al.  A general framework for graph sparsification , 2010, STOC '11.

[33]  Nikhil Srivastava,et al.  Graph Sparsification by Effective Resistances , 2011, SIAM J. Comput..

[34]  Ashish Goel,et al.  Single pass sparsification in the streaming model with edge deletions , 2012, ArXiv.

[35]  Ashish Goel,et al.  On the communication and streaming complexity of maximum bipartite matching , 2012, SODA.

[36]  Xiaoming Sun,et al.  Streaming and Communication Complexity of Clique Approximation , 2012, ICALP.

[37]  Sudipto Guha,et al.  Graph sketches: sparsification, spanners, and subgraphs , 2012, PODS.

[38]  Jonathan A. Kelner,et al.  Spectral Sparsification in the Semi-streaming Setting , 2012, Theory of Computing Systems.

[39]  Thomas Sauerwald,et al.  Counting Arbitrary Subgraphs in Data Streams , 2012, ICALP.

[40]  Sudipto Guha,et al.  Analyzing graph structure via linear measurements , 2012, SODA.

[41]  Charalampos E. Tsourakakis,et al.  Colorful triangle counting and a MapReduce implementation , 2011, Inf. Process. Lett..

[42]  Claire Mathieu,et al.  Maximum Matching in Semi-streaming with Few Passes , 2011, APPROX-RANDOM.

[43]  Sergei Vassilvitskii,et al.  Densest Subgraph in Streaming and MapReduce , 2012, Proc. VLDB Endow..

[44]  Bruce M. Kapron,et al.  Dynamic graph connectivity in polylogarithmic worst case time , 2013, SODA.

[45]  Rafail Ostrovsky,et al.  How Hard Is Counting Triangles in the Streaming Model? , 2013, ICALP.

[46]  Andrew McGregor,et al.  Dynamic Graphs in the Sliding-Window Model , 2013, ESA.

[47]  Kook Jin Ahn,et al.  Analyzing Massive Graphs in the Semi-streaming Model , 2013 .

[48]  Mikhail Kapralov,et al.  Better bounds for matchings in the streaming model , 2012, SODA.

[49]  Venkatesan Guruswami,et al.  Superlinear Lower Bounds for Multipass Graph Processing , 2013, Computational Complexity Conference.

[50]  Rasmus Pagh,et al.  On the streaming complexity of computing local clustering coefficients , 2013, WSDM.

[51]  Sudipto Guha,et al.  Spectral Sparsification in Dynamic Graph Streams , 2013, APPROX-RANDOM.

[52]  Leah Epstein,et al.  Improved Bounds for Online Preemptive Matching , 2012, STACS.

[53]  Ali Pinar,et al.  A space efficient streaming algorithm for triangle counting using the birthday paradox , 2012, KDD.

[54]  Kun-Lung Wu,et al.  Counting and Sampling Triangles from a Graph Stream , 2013, Proc. VLDB Endow..

[55]  Seshadhri Comandur,et al.  When a Graph is not so Simple: Counting Triangles in Multigraph Streams , 2013, ArXiv.

[56]  Adi Rosén,et al.  Approximating Semi-matchings in Streaming and in Two-Party Communication , 2013, ICALP.

[57]  Amit Chakrabarti,et al.  Submodular maximization meets streaming: matchings, matroids, and more , 2013, Math. Program..

[58]  Sanjeev Khanna,et al.  Approximating matching size from random streams , 2014, SODA.

[59]  Access to Data and Number of Iterations: Dual Primal Algorithms for Maximum Matching under Resource Constraints , 2013, SPAA.

[60]  Qin Zhang,et al.  Lower Bounds for Number-in-Hand Multiparty Communication Complexity, Made Easy , 2011, SIAM J. Comput..

[61]  Graham Cormode,et al.  Robust lower bounds for communication and stream computation , 2008, Theory Comput..