Scalable Spectral Clustering for Overlapping Community Detection in Large-Scale Networks

While the majority of methods for community detection produce disjoint communities of nodes, most real-world networks naturally involve overlapping communities. In this paper, a scalable method for the detection of overlapping communities in large networks is proposed. The method is based on an extension of the notion of normalized cut to cope with overlapping communities. A spectral clustering algorithm is formulated to solve the related cut minimization problem. When available, the algorithm may take into account prior information about the likelihood for each node to belong to several communities. This information can either be extracted from the available metadata or from node centrality measures. We also introduce a hierarchical version of the algorithm to automatically detect the number of communities. In addition, a new benchmark model extending the stochastic blockmodel for graphs with overlapping communities is formulated. Our experiments show that the proposed spectral method outperforms the state-of-the-art algorithms in terms of computational complexity and accuracy on our benchmark graph model and on five real-world networks, including a lexical network and large-scale social networks. The scalability of the proposed algorithm is also demonstrated on large synthetic graphs with millions of nodes and edges.

[1]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[2]  Xiaoheng Deng,et al.  Finding overlapping communities based on Markov chain and link clustering , 2016, Peer-to-Peer Networking and Applications.

[3]  T.S.Evans,et al.  Line graphs of weighted networks for overlapping communities , 2009, 0912.4389.

[4]  P. Latouche,et al.  Overlapping stochastic block models with application to the French political blogosphere , 2009, 0910.2098.

[5]  Boleslaw K. Szymanski,et al.  Overlapping community detection in networks: The state-of-the-art and comparative study , 2011, CSUR.

[6]  Richard M. Karp,et al.  Algorithms for graph partitioning on the planted partition model , 2001, Random Struct. Algorithms.

[7]  Andrea Lancichinetti,et al.  Detecting the overlapping and hierarchical community structure in complex networks , 2008, 0802.1218.

[8]  Padhraic Smyth,et al.  A Spectral Clustering Approach To Finding Communities in Graph , 2005, SDM.

[9]  Douglas R. White,et al.  Role models for complex networks , 2007, 0708.0958.

[10]  Boleslaw K. Szymanski,et al.  Towards Linear Time Overlapping Community Detection in Social Networks , 2012, PAKDD.

[11]  Jianwu Dang,et al.  Combined node and link partitions method for finding overlapping communities in complex networks , 2015, Scientific Reports.

[12]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[15]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[16]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[17]  Stephen Roberts,et al.  Overlapping community detection using Bayesian non-negative matrix factorization. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Robin Sibson,et al.  SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method , 1973, Comput. J..

[19]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[20]  Rok Sosic,et al.  SNAP , 2016, ACM Trans. Intell. Syst. Technol..

[21]  Huawei Shen,et al.  Quantifying and identifying the overlapping community structure in networks , 2009, 0905.2666.

[22]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[23]  Malik Magdon-Ismail,et al.  SSDE-Cluster: Fast Overlapping Clustering of Networks Using Sampled Spectral Distance Embedding and GMMs , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[24]  Wei Chen,et al.  A game-theoretic framework to identify overlapping communities in social networks , 2010, Data Mining and Knowledge Discovery.

[25]  Inderjit S. Dhillon,et al.  Non-exhaustive, Overlapping k-means , 2015, SDM.

[26]  C. Lanczos An iteration method for the solution of the eigenvalue problem of linear differential and integral operators , 1950 .

[27]  Jérôme Kunegis,et al.  KONECT: the Koblenz network collection , 2013, WWW.

[28]  Malik Magdon-Ismail,et al.  Finding communities by clustering a graph into overlapping subgraphs , 2005, IADIS AC.

[29]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[30]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[31]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[32]  J. Kumpula,et al.  Sequential algorithm for fast clique percolation. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[33]  T. Nepusz,et al.  Fuzzy communities and the concept of bridgeness in complex networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[34]  Shihua Zhang,et al.  Identification of overlapping community structure in complex networks using fuzzy c-means clustering , 2007 .

[35]  I. Good On the Application of Symmetric Dirichlet Distributions and their Mixtures to Contingency Tables , 1976 .

[36]  T. Vicsek,et al.  Weighted network modules , 2007, cond-mat/0703706.

[37]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.