Kernelization via Sampling with Applications to Dynamic Graph Streams

In this paper we present a simple but powerful subgraph sampling primitive that is applicable in a variety of computational models including dynamic graph streams (where the input graph is defined by a sequence of edge/hyperedge insertions and deletions) and distributed systems such as MapReduce. In the case of dynamic graph streams, we use this primitive to prove the following results: * Matching: Our main result for matchings is that there exists an O~(k2) space algorithm that returns the edges of a maximum matching on the assumption the cardinality is at most k. The best previous algorithm used O~(kn) space where n is the number of vertices in the graph and we prove our result is optimal up to logarithmic factors. Our algorithm has O~(1) update time. We also show that there exists an O~(n2/α3) space algorithm that returns an α-approximation for matchings of arbitrary size. In independent work, Assadi et al. (arXiv 2015) proved this is optimal and provided an alternative algorithm. We generalize our exact and approximate algorithms to weighted matching. While there has been a substantial amount of work on approximate matching in insert-only graph streams, these are the first non-trivial results in the dynamic setting. * Vertex Cover and Hitting Set: There exists an O~(kd) space algorithm that solves the minimum hitting set problem where d is the cardinality of the input sets and k is an upper bound on the size of the minimum hitting set. We prove this is optimal up to logarithmic factors. Our algorithm has O~(1) update time. The case d=2 corresponds to minimum vertex cover. Finally, we consider a larger family of parameterized problems (including b-matching, disjoint paths, vertex coloring among others) for which our subgraph sampling primitive yields fast, small-space dynamic graph stream algorithms. We then show lower bounds for natural problems outside this family.

[1]  Piotr Indyk,et al.  Sparse Recovery Using Sparse Matrices , 2010, Proceedings of the IEEE.

[2]  Leah Epstein,et al.  Improved Approximation Guarantees for Weighted Matching in the Semi-streaming Model , 2009, SIAM J. Discret. Math..

[3]  Chris Schwiegelshohn,et al.  Sublinear Estimation of Weighted Matchings in Dynamic Data Streams , 2015, ESA.

[4]  Sudipto Guha,et al.  Graph sketches: sparsification, spanners, and subgraphs , 2012, PODS.

[5]  Joan Feigenbaum,et al.  On graph problems in a semi-streaming model , 2005, Theor. Comput. Sci..

[6]  Mariano Zelke,et al.  Weighted Matching in the Semi-Streaming Model , 2007, Algorithmica.

[7]  List of Open Problems in Sublinear Algorithms , .

[8]  Yang Li,et al.  Tight Bounds for Linear Sketches of Approximate Matchings , 2015, ArXiv.

[9]  Graham Cormode,et al.  A unifying framework for ℓ0-sampling algorithms , 2013, Distributed and Parallel Databases.

[10]  He Sun,et al.  Counting Hypergraphs in Data Streams , 2013, ArXiv.

[11]  J. Spencer Intersection Theorems for Systems of Sets , 1977, Canadian Mathematical Bulletin.

[12]  Rasmus Pagh,et al.  Triangle Counting in Dynamic Graph Streams , 2014, Algorithmica.

[13]  Robert Krauthgamer,et al.  Sketching Cuts in Graphs and Hypergraphs , 2014, ITCS.

[14]  Magnús M. Halldórsson,et al.  Streaming Algorithms for Independent Sets , 2010, ICALP.

[15]  Farid M. Ablayev,et al.  Lower Bounds for One-Way Probabilistic Communication Complexity and Their Application to Space Complexity , 1996, Theor. Comput. Sci..

[16]  Sofya Vorotnikova,et al.  Densest Subgraph in Dynamic Graph Streams , 2015, MFCS.

[17]  Sudipto Guha,et al.  Analyzing graph structure via linear measurements , 2012, SODA.

[18]  Stefan Fafianie,et al.  Streaming Kernelization , 2014, MFCS.

[19]  Ashish Goel,et al.  Single pass sparsification in the streaming model with edge deletions , 2012, ArXiv.

[20]  Yin Tat Lee,et al.  Single Pass Spectral Sparsification in Dynamic Streams , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[21]  Sudipto Guha,et al.  Laminar Families and Metric Embeddings: Non-bipartite Maximum Matching Problem in the Semi-Streaming Model , 2011, ArXiv.

[22]  Krzysztof Onak,et al.  Streaming Algorithms for Estimating the Matching Size in Planar Graphs and Beyond , 2015, SODA.

[23]  Mikhail Kapralov,et al.  Better bounds for matchings in the streaming model , 2012, SODA.

[24]  Ashish Goel,et al.  On the communication and streaming complexity of maximum bipartite matching , 2012, SODA.

[25]  Michael R. Fellows,et al.  Parameterized Complexity , 1998 .

[26]  Claire Mathieu,et al.  Maximum Matching in Semi-streaming with Few Passes , 2011, APPROX-RANDOM.

[27]  Adi Rosén,et al.  Approximating Semi-matchings in Streaming and in Two-Party Communication , 2013, ICALP.

[28]  Sudipto Guha,et al.  Spectral Sparsification in Dynamic Graph Streams , 2013, APPROX-RANDOM.

[29]  Jaikumar Radhakrishnan,et al.  Streaming Algorithms for 2-Coloring Uniform Hypergraphs , 2011, WADS.

[30]  Christian Konrad,et al.  Maximum Matching in Turnstile Streams , 2015, ESA.

[31]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[32]  Adi Rosén,et al.  Semi-Streaming Set Cover - (Extended Abstract) , 2014, ICALP.

[33]  David P. Woodruff,et al.  Brief Announcement: Applications of Uniform Sampling: Densest Subgraph and Beyond , 2015, SPAA.

[34]  Sudipto Guha,et al.  Correlation Clustering in Data Streams , 2015, ICML.

[35]  Hossein Jowhari,et al.  Tight bounds for Lp samplers, finding duplicates in streams, and related problems , 2010, PODS.

[36]  David P. Woodruff,et al.  Spanners and sparsifiers in dynamic streams , 2014, PODC.

[37]  Sudipto Guha,et al.  Linear programming in the semi-streaming model with application to the maximum matching problem , 2011, Inf. Comput..

[38]  Andrew McGregor,et al.  Graph stream algorithms: a survey , 2014, SGMD.

[39]  Andrew McGregor,et al.  Finding Graph Matchings in Data Streams , 2005, APPROX-RANDOM.

[40]  Graham Cormode,et al.  Parameterized streaming: maximal matching and vertex cover , 2015, SODA 2015.

[41]  Jörg Flum,et al.  Parameterized Complexity Theory , 2006, Texts in Theoretical Computer Science. An EATCS Series.

[42]  Sanjeev Khanna,et al.  Approximating matching size from random streams , 2014, SODA.

[43]  Charalampos E. Tsourakakis,et al.  Space- and Time-Efficient Algorithm for Maintaining Dense Subgraphs on One-Pass Dynamic Streams , 2015, STOC.

[44]  Sudipto Guha,et al.  Vertex and Hyperedge Connectivity in Dynamic Graph Streams , 2015, PODS.

[45]  Michael Crouch,et al.  Improved Streaming Algorithms for Weighted Matching, via Unweighted Matching , 2014, APPROX-RANDOM.

[46]  Lise Getoor,et al.  On Maximum Coverage in the Streaming Model & Application to Multi-topic Blog-Watch , 2009, SDM.