Locally Densest Subgraph Discovery

Mining dense subgraphs from a large graph is a fundamental graph mining task and can be widely applied in a variety of application domains such as network science, biology, graph database, web mining, graph compression, and micro-blogging systems. Here a dense subgraph is defined as a subgraph with high density (#.edge / #.node). Existing studies of this problem either focus on finding the densest subgraph or identifying an optimal clique-like dense subgraph, and they adopt a simple greedy approach to find the top-k dense subgraphs. However, their identified subgraphs cannot be used to represent the dense regions of the graph. Intuitively, to represent a dense region, the subgraph identified should be the subgraph with highest density in its local region in the graph. However, it is non-trivial to formally model a locally densest subgraph. In this paper, we aim to discover top-k such representative locally densest subgraphs of a graph. We provide an elegant parameter-free definition of a locally densest subgraph. The definition not only fits well with the intuition, but is also associated with several nice structural properties. We show that the set of locally densest subgraphs in a graph can be computed in polynomial time. We further propose three novel pruning strategies to largely reduce the search space of the algorithm. In our experiments, we use several real datasets with various graph properties to evaluate the effectiveness of our model using four quality measures and a case study. We also test our algorithms on several real web-scale graphs, one of which contains 118.14 million nodes and 1.02 billion edges, to demonstrate the high efficiency of the proposed algorithms.

[1]  Charalampos E. Tsourakakis,et al.  Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees , 2013, KDD.

[2]  Charu C. Aggarwal,et al.  A Survey of Algorithms for Dense Subgraph Discovery , 2010, Managing and Mining Graph Data.

[3]  Takuya Akiba,et al.  Linear-time enumeration of maximal K-edge-connected subgraphs in large networks by random contraction , 2013, CIKM.

[4]  Vladimir Batagelj,et al.  An O(m) Algorithm for Cores Decomposition of Networks , 2003, ArXiv.

[5]  Moses Charikar,et al.  Greedy approximation algorithms for finding dense components in a graph , 2000, APPROX.

[6]  Hisao Tamaki,et al.  Greedily Finding a Dense Subgraph , 2000, J. Algorithms.

[7]  Sergei Vassilvitskii,et al.  Densest Subgraph in Streaming and MapReduce , 2012, Proc. VLDB Endow..

[8]  Christos Faloutsos,et al.  EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[9]  Venkatesan Guruswami,et al.  CopyCatch: stopping group attacks by spotting lockstep behavior in social networks , 2013, WWW.

[10]  Gregory Buehrer,et al.  A scalable pattern mining approach to web graph compression with communities , 2008, WSDM '08.

[11]  Robert E. Tarjan,et al.  A Fast Parametric Maximum Flow Algorithm and Applications , 1989, SIAM J. Comput..

[12]  Marco Pellegrini,et al.  Extraction and classification of dense communities in the web , 2007, WWW '07.

[13]  Jeffrey Xu Yu,et al.  Efficient Core Maintenance in Large Dynamic Graphs , 2012, IEEE Transactions on Knowledge and Data Engineering.

[14]  Yang Xiang,et al.  3-HOP: a high-compression indexing scheme for reachability query , 2009, SIGMOD Conference.

[15]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[16]  Yousef Saad,et al.  Dense Subgraph Extraction with Application to Community Detection , 2012, IEEE Transactions on Knowledge and Data Engineering.

[17]  Jeffrey Xu Yu,et al.  Finding maximal cliques in massive networks , 2011, TODS.

[18]  James Cheng,et al.  Efficient core decomposition in massive networks , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[19]  Samir Khuller,et al.  On Finding Dense Subgraphs , 2009, ICALP.

[20]  Kumar Chellapilla,et al.  Finding Dense Subgraphs with Size Bounds , 2009, WAW.

[21]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[22]  Refael Hassin,et al.  Complexity of finding dense subgraphs , 2002, Discret. Appl. Math..

[23]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[24]  David F. Gleich,et al.  Vertex neighborhoods, low conductance cuts, and good seeds for local community methods , 2012, KDD.

[25]  S. E. Schaeffer Survey Graph clustering , 2007 .

[26]  Serafim Batzoglou,et al.  MotifCut: regulatory motifs finding with maximum density subgraphs , 2006, ISMB.

[27]  Divesh Srivastava,et al.  Dense subgraph maintenance under streaming edge weight updates for real-time story identification , 2012, The VLDB Journal.

[28]  Leandros Tassiulas,et al.  Detecting Influential Spreaders in Complex, Dynamic Networks , 2013, Computer.

[29]  Samir Khuller,et al.  Dense Subgraphs with Restrictions and Applications to Gene Annotation Graphs , 2010, RECOMB.

[30]  Ravi Kumar,et al.  Discovering Large Dense Subgraphs in Massive Graphs , 2005, VLDB.

[31]  John E. Hopcroft,et al.  On the separability of structural classes of communities , 2012, KDD.

[32]  Edith Cohen,et al.  Reachability and distance queries via 2-hop labels , 2002, SODA '02.

[33]  Stephen B. Seidman,et al.  Network structure and minimum degree , 1983 .

[34]  Charalampos E. Tsourakakis Mathematical and Algorithmic Analysis of Network and Biological Data , 2014, ArXiv.

[35]  Christos Faloutsos,et al.  Spotting Suspicious Link Behavior with fBox: An Adversarial Perspective , 2014, 2014 IEEE International Conference on Data Mining.

[36]  Jia Wang,et al.  Truss Decomposition in Massive Networks , 2012, Proc. VLDB Endow..

[37]  Andrew V. Goldberg,et al.  Finding a Maximum Density Subgraph , 1984 .

[38]  Weifa Liang,et al.  Efficiently computing k-edge connected components via graph decomposition , 2013, SIGMOD '13.