Efficient Densest Subgraph Computation in Evolving Graphs

Densest subgraph computation has emerged as an important primitive in a wide range of data analysis tasks such as community and event detection. Social media such as Facebook and Twitter are highly dynamic with new friendship links and tweets being generated incessantly, calling for efficient algorithms that can handle very large and highly dynamic input data. While either scalable or dynamic algorithms for finding densest subgraphs have been proposed, a viable and satisfactory solution for addressing both the dynamic aspect of the input data and its large size is still missing. We study the densest subgraph problem in the the dynamic graph model, for which we present the first scalable algorithm with provable guarantees. In our model, edges are added adversarially while they are removed uniformly at random from the current graph. We show that at any point in time we are able to maintain a 2(1+ε)-approximation of a current densest subgraph, while requiring O(polylog(n+r)) amortized cost per update (with high probability), where r is the total number of update operations executed and n is the maximum number of nodes in the graph. In contrast, a naive algorithm that recomputes a dense subgraph every time the graph changes requires Omega(m) work per update, where m is the number of edges in the current graph. Our theoretical analysis is complemented with an extensive experimental evaluation on large real-world graphs showing that (approximate) densest subgraphs can be maintained efficiently within hundred of microseconds per update.

[1]  Charu C. Aggarwal,et al.  A Survey of Algorithms for Dense Subgraph Discovery , 2010, Managing and Mining Graph Data.

[2]  Divesh Srivastava,et al.  Dense subgraph maintenance under streaming edge weight updates for real-time story identification , 2012, The VLDB Journal.

[3]  Mikhail J. Atallah,et al.  Algorithms and Theory of Computation Handbook , 2009, Chapman & Hall/CRC Applied Algorithms and Data Structures series.

[4]  Moses Charikar,et al.  Greedy approximation algorithms for finding dense components in a graph , 2000, APPROX.

[5]  Özgür Ulusoy,et al.  Distributed $k$ -Core View Materializationand Maintenance for Large Dynamic Graphs , 2014, IEEE Trans. Knowl. Data Eng..

[6]  Andrew V. Goldberg,et al.  Finding a Maximum Density Subgraph , 1984 .

[7]  Gregory Buehrer,et al.  A scalable pattern mining approach to web graph compression with communities , 2008, WSDM '08.

[8]  Ravi Kumar,et al.  Structure and evolution of online social networks , 2006, KDD '06.

[9]  Fred B. Chambers,et al.  Distributed Computing , 2016, Lecture Notes in Computer Science.

[10]  Laks V. S. Lakshmanan,et al.  CAST: A Context-Aware Story-Teller for Streaming Social Content , 2014, CIKM.

[11]  Mauro Brunato,et al.  On Effectively Finding Maximal Quasi-cliques in Graphs , 2008, LION.

[12]  Òscar Celma,et al.  Music Recommendation and Discovery - The Long Tail, Long Fail, and Long Play in the Digital Music Space , 2010 .

[13]  Eli Upfal,et al.  PageRank on an evolving graph , 2012, KDD.

[14]  Yang Xiang,et al.  3-HOP: a high-compression indexing scheme for reachability query , 2009, SIGMOD Conference.

[15]  Aristides Gionis,et al.  The community-search problem and how to plan a successful cocktail party , 2010, KDD.

[16]  Òscar Celma,et al.  Music recommendation and discovery in the long tail , 2008 .

[17]  Francesco Bonchi,et al.  Finding Subgraphs with Maximum Total Density and Limited Overlap , 2015, WSDM.

[18]  David Eppstein,et al.  Dynamic graph algorithms , 2010 .

[19]  Charalampos E. Tsourakakis,et al.  Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees , 2013, KDD.

[20]  Stephen B. Seidman,et al.  Network structure and minimum degree , 1983 .

[21]  Ashish Goel,et al.  Efficient Primal-Dual Algorithms for MapReduce , 2013 .

[22]  Charalampos E. Tsourakakis,et al.  Space- and Time-Efficient Algorithm for Maintaining Dense Subgraphs on One-Pass Dynamic Streams , 2015, STOC.

[23]  Robert E. Tarjan,et al.  Finding Strongly Knit Clusters in Social Networks , 2008, Internet Math..

[24]  David Eppstein,et al.  Separator based sparsification for dynamic planar graph algorithms , 1993, STOC '93.

[25]  Kun-Lung Wu,et al.  Streaming Algorithms for k-core Decomposition , 2013, Proc. VLDB Endow..

[26]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[27]  Samir Khuller,et al.  On Finding Dense Subgraphs , 2009, ICALP.

[28]  Samir Khuller,et al.  Dense Subgraphs with Restrictions and Applications to Gene Annotation Graphs , 2010, RECOMB.

[29]  Ravi Kumar,et al.  Discovering Large Dense Subgraphs in Massive Graphs , 2005, VLDB.

[30]  Kamesh Munagala,et al.  Efficient Primal-Dual Graph Algorithms for MapReduce , 2014, WAW.

[31]  Edith Cohen,et al.  Reachability and distance queries via 2-hop labels , 2002, SODA '02.

[32]  Sergei Vassilvitskii,et al.  Densest Subgraph in Streaming and MapReduce , 2012, Proc. VLDB Endow..

[33]  Marco Pellegrini,et al.  Extraction and classification of dense communities in the web , 2007, WWW '07.

[34]  Yousef Saad,et al.  Dense Subgraph Extraction with Application to Community Detection , 2012, IEEE Transactions on Knowledge and Data Engineering.

[35]  Takuya Akiba,et al.  Fast exact shortest-path distance queries on large networks by pruned landmark labeling , 2013, SIGMOD '13.

[36]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[37]  Apostolos N. Papadopoulos,et al.  Discovery of Top-k Dense Subgraphs in Dynamic Graph Collections , 2012, SSDBM.

[38]  Ashwin Lall,et al.  Dense Subgraphs on Dynamic Networks , 2012, DISC.

[39]  Jeffrey Xu Yu,et al.  Efficient Core Maintenance in Large Dynamic Graphs , 2012, IEEE Transactions on Knowledge and Data Engineering.