Detecting dense communities in large social and information networks with the Core & Peel algorithm

Detecting and characterizing dense subgraphs (tight communities) in social and information networks is an important exploratory tool in social network analysis. Several approaches have been proposed that either (i) partition the whole network into clusters, even in low density region, or (ii) are aimed at finding a single densest community (and need to be iterated to find the next one). As social networks grow larger both approaches (i) and (ii) result in algorithms too slow to be practical, in particular when speed in analyzing the data is required. In this paper we propose an approach that aims at balancing efficiency of computation and expressiveness and manageability of the output community representation. We define the notion of a partial dense cover (PDC) of a graph. Intuitively a PDC of a graph is a collection of sets of nodes that (a) each set forms a disjoint dense induced subgraphs and (b) its removal leaves the residual graph without dense regions. Exact computation of PDC is an NP-complete problem, thus, we propose an efficient heuristic algorithms for computing a PDC which we christen Core and Peel. Moreover we propose a novel benchmarking technique that allows us to evaluate algorithms for computing PDC using the classical IR concepts of precision and recall even without a golden standard. Tests on 25 social and technological networks from the Stanford Large Network Dataset Collection confirm that Core and Peel is efficient and attains very high precison and recall.

[1]  Johan Håstad,et al.  Clique is hard to approximate within n1-epsilon , 1996, Electron. Colloquium Comput. Complex..

[2]  Uriel Feige,et al.  The Dense k -Subgraph Problem , 2001, Algorithmica.

[3]  Vladimir Batagelj,et al.  An O(m) Algorithm for Cores Decomposition of Networks , 2003, ArXiv.

[4]  Panos M. Pardalos,et al.  The maximum clique problem , 1994, J. Glob. Optim..

[5]  Sandra Sudarsky,et al.  Massive Quasi-Clique Detection , 2002, LATIN.

[6]  Yousef Saad,et al.  Dense Subgraph Extraction with Application to Community Detection , 2012, IEEE Transactions on Knowledge and Data Engineering.

[7]  Anthony K. H. Tung,et al.  On Triangulation-based Dense Neighborhood Graphs Discovery , 2010, Proc. VLDB Endow..

[8]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[9]  Samir Khuller,et al.  On Finding Dense Subgraphs , 2009, ICALP.

[10]  P. Erdös On an extremal problem in graph theory , 1970 .

[11]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[12]  Uriel Feige Rigorous analysis of heuristics for NP-hard problems , 2005, SODA '05.

[13]  Balabhaskar Balasundaram,et al.  Graph theoretic generalizations of clique: optimization and extensions , 2009 .

[14]  James Cheng,et al.  Efficient core decomposition in massive networks , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[15]  Kumar Chellapilla,et al.  Finding Dense Subgraphs with Size Bounds , 2009, WAW.

[16]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[17]  Moses Charikar,et al.  Greedy approximation algorithms for finding dense components in a graph , 2000, APPROX.

[18]  Robert E. Tarjan,et al.  A Fast Parametric Maximum Flow Algorithm and Applications , 1989, SIAM J. Comput..

[19]  G. Dirac Extensions of Turán's theorem on graphs , 1963 .

[20]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[21]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[22]  J. Håstad Clique is hard to approximate withinn1−ε , 1999 .

[23]  Ravi Kumar,et al.  Discovering Large Dense Subgraphs in Massive Graphs , 2005, VLDB.

[24]  Jaikumar Radhakrishnan,et al.  Greed is good: Approximating independent sets in sparse and bounded-degree graphs , 1997, Algorithmica.

[25]  Carsten Lund,et al.  On the hardness of approximating minimization problems , 1993, STOC.

[26]  Stephen B. Seidman,et al.  Network structure and minimum degree , 1983 .

[27]  Subhash Khot,et al.  Ruling out PTAS for graph min-bisection, densest subgraph and bipartite clique , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[28]  Marco Pellegrini,et al.  Extraction and classification of dense implicit communities in the Web graph , 2009, TWEB.

[29]  Aditya Bhaskara,et al.  Detecting high log-densities: an O(n¼) approximation for densest k-subgraph , 2010, STOC '10.

[30]  Edward R. Scheinerman,et al.  On Random Intersection Graphs: The Subgraph Problem , 1999, Combinatorics, Probability and Computing.

[31]  Jianyong Wang,et al.  Out-of-core coherent closed quasi-clique mining from large dense graph databases , 2007, TODS.

[32]  Anthony K. H. Tung,et al.  CSV: visualizing and mining cohesive subgraphs , 2008, SIGMOD Conference.

[33]  Sven Kosub,et al.  Local Density , 2004, Network Analysis.

[34]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[35]  Reid Andersen Finding large and small dense subgraphs , 2007, ArXiv.

[36]  Andrew V. Goldberg,et al.  Finding a Maximum Density Subgraph , 1984 .

[37]  Guimei Liu,et al.  Effective Pruning Techniques for Mining Quasi-Cliques , 2008, ECML/PKDD.

[38]  Charu C. Aggarwal,et al.  A Survey of Algorithms for Dense Subgraph Discovery , 2010, Managing and Mining Graph Data.