Tight bounds for single-pass streaming complexity of the set cover problem

We resolve the space complexity of single-pass streaming algorithms for approximating the classic set cover problem. For finding an α-approximate set cover (for α= o(√n)) via a single-pass streaming algorithm, we show that Θ(mn/α) space is both sufficient and necessary (up to an O(logn) factor); here m denotes number of the sets and n denotes size of the universe. This provides a strong negative answer to the open question posed by Indyk (2015) regarding the possibility of having a single-pass algorithm with a small approximation factor that uses sub-linear space. We further study the problem of estimating the size of a minimum set cover (as opposed to finding the actual sets), and establish that an additional factor of α saving in the space is achievable in this case and that this is the best possible. In other words, we show that Θ(mn/α2) space is both sufficient and necessary (up to logarithmic factors) for estimating the size of a minimum set cover to within a factor of α. Our algorithm in fact works for the more general problem of estimating the optimal value of a covering integer program. On the other hand, our lower bound holds even for set cover instances where the sets are presented in a random order.

[1]  Robert Kraughgamer,et al.  An FPTAS for minimizing indefinite quadratic forms over integers in polyhedra , 2016, SODA 2016.

[2]  Andrew Chi-Chih Yao,et al.  Informational complexity and the direct sum problem for simultaneous message complexity , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[3]  Lise Getoor,et al.  On Maximum Coverage in the Streaming Model & Application to Multi-topic Blog-Watch , 2009, SDM.

[4]  Ziv Bar-Yossef,et al.  An information statistics approach to data stream and communication complexity , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[5]  Piotr Indyk,et al.  Towards Tight Bounds for the Streaming Set Cover Problem , 2015, PODS.

[6]  Russell Impagliazzo,et al.  Constructive Proofs of Concentration Bounds , 2010, APPROX-RANDOM.

[7]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[8]  Adi Rosén,et al.  Semi-Streaming Set Cover - (Extended Abstract) , 2014, ICALP.

[9]  A. Razborov Communication Complexity , 2011 .

[10]  Graham Cormode,et al.  Robust lower bounds for communication and stream computation , 2008, Theory Comput..

[11]  David Steurer,et al.  Analytical approach to parallel repetition , 2013, STOC.

[12]  Noam Nisan,et al.  The Communication Complexity of Approximate Set Packing and Covering , 2002, ICALP.

[13]  Graham Cormode,et al.  Set cover algorithms for very large datasets , 2010, CIKM.

[14]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[15]  Peter Slavík A Tight Analysis of the Greedy Algorithm for Set Cover , 1997, J. Algorithms.

[16]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[17]  Mert Saglam,et al.  Tight bounds for data stream algorithms and communication problems , 2011 .

[18]  Carsten Lund,et al.  On the hardness of approximating minimization problems , 1994, JACM.

[19]  Michael Saks,et al.  Information theory methods in communication complexity , 2012 .

[20]  Xi Chen,et al.  How to compress interactive communication , 2010, STOC '10.

[21]  Aravind Srinivasan,et al.  Randomized Distributed Edge Coloring via an Extension of the Chernoff-Hoeffding Bounds , 1997, SIAM J. Comput..

[22]  Vahab S. Mirrokni,et al.  Almost Optimal Streaming Algorithms for Coverage Problems , 2016, SPAA.

[23]  David P. Woodruff,et al.  Optimal bounds for Johnson-Lindenstrauss transforms and streaming problems with sub-constant error , 2011, SODA '11.

[24]  Ke Wang,et al.  Proceedings of the SIAM International Conference on Data Mining, SDM 2009, April 30 - May 2, 2009, Sparks, Nevada, USA , 2009, SDM.

[25]  Piotr Indyk,et al.  On Streaming and Communication Complexity of the Set Cover Problem , 2014, DISC.

[26]  Amit Chakrabarti,et al.  Incidence Geometries and the Pass Complexity of Semi-Streaming Set Cover , 2015, SODA.

[27]  David S. Johnson,et al.  Approximation algorithms for combinatorial problems , 1973, STOC.

[28]  A. Kemper,et al.  On Graph Problems in a Semi-streaming Model , 2015 .

[29]  Thomas M. Cover,et al.  Elements of Information Theory: Cover/Elements of Information Theory, Second Edition , 2005 .

[30]  Sanjeev Khanna,et al.  Approximating matching size from random streams , 2014, SODA.