Space- and Time-Efficient Algorithm for Maintaining Dense Subgraphs on One-Pass Dynamic Streams

While in many graph mining applications it is crucial to handle a stream of updates efficiently in terms of both time and space, not much was known about achieving such type of algorithm. In this paper we study this issue for a problem which lies at the core of many graph mining applications called densest subgraph problem. We develop an algorithm that achieves time- and space-efficiency for this problem simultaneously. It is one of the first of its kind for graph problems to the best of our knowledge. Given an input graph, the densest subgraph is the subgraph that maximizes the ratio between the number of edges and the number of nodes. For any ε>0, our algorithm can, with high probability, maintain a (4+ε)-approximate solution under edge insertions and deletions using ~O(n) space and ~O(1) amortized time per update; here, $n$ is the number of nodes in the graph and ~O hides the O(polylog_{1+ε} n) term. The approximation ratio can be improved to (2+ε) with more time. It can be extended to a (2+ε)-approximation sublinear-time algorithm and a distributed-streaming algorithm. Our algorithm is the first streaming algorithm that can maintain the densest subgraph in one pass. Prior to this, no algorithm could do so even in the special case of an incremental stream and even when there is no time restriction. The previously best algorithm in this setting required O(log n) passes [BahmaniKV12]. The space required by our algorithm is tight up to a polylogarithmic factor.

[1]  Bruce M. Kapron,et al.  Dynamic graph connectivity in polylogarithmic worst case time , 2013, SODA.

[2]  Charu C. Aggarwal,et al.  A Survey of Algorithms for Dense Subgraph Discovery , 2010, Managing and Mining Graph Data.

[3]  Subhash Khot,et al.  Near-optimal lower bounds on the multi-party communication complexity of set disjointness , 2003, 18th IEEE Annual Conference on Computational Complexity, 2003. Proceedings..

[4]  Ashish Goel,et al.  Efficient Primal-Dual Algorithms for MapReduce , 2013 .

[5]  Hossein Jowhari,et al.  Tight bounds for Lp samplers, finding duplicates in streams, and related problems , 2010, PODS.

[6]  Oded Goldreich A Brief Introduction to Property Testing , 2011, Studies in Complexity and Cryptography.

[7]  David P. Woodruff,et al.  Spanners and sparsifiers in dynamic streams , 2014, PODC.

[8]  Shimon Even,et al.  An On-Line Edge-Deletion Problem , 1981, JACM.

[9]  Silvio Lattanzi,et al.  Efficient Densest Subgraph Computation in Evolving Graphs , 2015, WWW.

[10]  P. Erdös On the structure of linear graphs , 1946 .

[11]  Monika Henzinger,et al.  Unifying and Strengthening Hardness for Dynamic Problems via the Online Matrix-Vector Multiplication Conjecture , 2015, STOC.

[12]  David P. Woodruff,et al.  Brief Announcement: Applications of Uniform Sampling: Densest Subgraph and Beyond , 2015, SPAA.

[13]  Anna Pagh,et al.  Uniform Hashing in Constant Time and Optimal Space , 2008, SIAM J. Comput..

[14]  Bernard Chazelle,et al.  Approximating the Minimum Spanning Tree Weight in Sublinear Time , 2001, ICALP.

[15]  Ashish Goel,et al.  Perfect Matchings in O(nlog n) Time in Regular Bipartite Graphs , 2013, SIAM J. Comput..

[16]  David P. Woodruff,et al.  1-pass relative-error Lp-sampling with applications , 2010, SODA '10.

[17]  Sudipto Guha,et al.  Graph sketches: sparsification, spanners, and subgraphs , 2012, PODS.

[18]  Sofya Vorotnikova,et al.  Densest Subgraph in Dynamic Graph Streams , 2015, MFCS.

[19]  Krzysztof Onak Sublinear Graph Approximation Algorithms , 2010, Property Testing.

[20]  Dana Ron,et al.  Algorithmic and Analysis Techniques in Property Testing , 2010, Found. Trends Theor. Comput. Sci..

[21]  Kun-Lung Wu,et al.  Streaming Algorithms for k-core Decomposition , 2013, Proc. VLDB Endow..

[22]  Artur Czumaj,et al.  Sublinear-Time Algorithms , 2006, Bull. EATCS.

[23]  Ravi Kumar,et al.  Discovering Large Dense Subgraphs in Massive Graphs , 2005, VLDB.

[24]  Kamesh Munagala,et al.  Efficient Primal-Dual Graph Algorithms for MapReduce , 2014, WAW.

[25]  Gerth Stølting Brodal,et al.  Dynamic Representation of Sparse Graphs , 1999, WADS.

[26]  Charalampos E. Tsourakakis,et al.  Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees , 2013, KDD.

[27]  Ronitt Rubinfeld,et al.  Sublinear Time Algorithms , 2011, SIAM J. Discret. Math..

[28]  G. Szekeres,et al.  An inequality for the chromatic number of a graph , 1968 .

[29]  Sergei Vassilvitskii,et al.  Densest Subgraph in Streaming and MapReduce , 2012, Proc. VLDB Endow..

[30]  Andrew V. Goldberg,et al.  Finding a Maximum Density Subgraph , 1984 .

[31]  Jakub W. Pachocki,et al.  Scalable Large Near-Clique Detection in Large-Scale Networks via Sampling , 2015, KDD.

[32]  Charalampos E. Tsourakakis A Novel Approach to Finding Near-Cliques: The Triangle-Densest Subgraph Problem , 2014, ArXiv.

[33]  Marco Pellegrini,et al.  Extraction and classification of dense communities in the web , 2007, WWW '07.

[34]  Yin Tat Lee,et al.  Single Pass Spectral Sparsification in Dynamic Streams , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[35]  Charalampos E. Tsourakakis The K-clique Densest Subgraph Problem , 2015, WWW.

[36]  J. Ian Munro,et al.  Selection and sorting with limited storage , 1978, 19th Annual Symposium on Foundations of Computer Science (sfcs 1978).

[37]  Sudipto Guha,et al.  Analyzing graph structure via linear measurements , 2012, SODA.

[38]  Divesh Srivastava,et al.  Dense subgraph maintenance under streaming edge weight updates for real-time story identification , 2012, The VLDB Journal.

[39]  Oded Goldreich,et al.  Introduction to Testing Graph Properties , 2010, Property Testing.

[40]  Moses Charikar,et al.  Greedy approximation algorithms for finding dense components in a graph , 2000, APPROX.

[41]  Sudipto Guha,et al.  Spectral Sparsification in Dynamic Graph Streams , 2013, APPROX-RANDOM.

[42]  Samir Khuller,et al.  On Finding Dense Subgraphs , 2009, ICALP.

[43]  Qin Zhang,et al.  Optimal sampling from distributed streams , 2010, PODS '10.

[44]  Huan Liu,et al.  Graph Mining Applications to Social Network Analysis , 2010, Managing and Mining Graph Data.