Overlapping Community Detection via Local Spectral Clustering

Large graphs arise in a number of contexts and understanding their structure and extracting information from them is an important research area. Early algorithms on mining communities have focused on the global structure, and often run in time functional to the size of the entire graph. Nowadays, as we often explore networks with billions of vertices and find communities of size hundreds, it is crucial to shift our attention from macroscopic structure to microscopic structure in large networks. A growing body of work has been adopting local expansion methods in order to identify the community members from a few exemplary seed members. In this paper, we propose a novel approach for finding overlapping communities called LEMON (Local Expansion via Minimum One Norm). The algorithm finds the community by seeking a sparse vector in the span of the local spectra such that the seeds are in its support. We show that LEMON can achieve the highest detection accuracy among state-of-the-art proposals. The running time depends on the size of the community rather than that of the entire graph. The algorithm is easy to implement, and is highly parallelizable. We further provide theoretical analysis on the local spectral properties, bounding the measure of tightness of extracted community in terms of the eigenvalues of graph Laplacian. Moreover, given that networks are not all similar in nature, a comprehensive analysis on how the local expansion approach is suited for uncovering communities in different networks is still lacking. We thoroughly evaluate our approach using both synthetic and real-world datasets across different domains, and analyze the empirical variations when applying our method to inherently different networks in practice. In addition, the heuristics on how the seed set quality and quantity would affect the performance are provided.

[1]  Boleslaw K. Szymanski,et al.  Overlapping community detection in networks: The state-of-the-art and comparative study , 2011, CSUR.

[2]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[3]  Kun He,et al.  Uncovering the Small Community Structure in Large Networks: A Local Spectral Approach , 2015, WWW.

[4]  Kun He,et al.  Detecting Overlapping Communities from Local Spectral Subspaces , 2015, 2015 IEEE International Conference on Data Mining.

[5]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[6]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  David F. Gleich,et al.  Heat kernel based community detection , 2014, KDD.

[8]  John E. Hopcroft,et al.  A separability framework for analyzing community structure , 2014, ACM Trans. Knowl. Discov. Data.

[9]  Martin Rosvall,et al.  Multilevel Compression of Random Walks on Networks Reveals Hierarchical Organization in Large Integrated Systems , 2010, PloS one.

[10]  Matthieu Latapy,et al.  Computing Communities in Large Networks Using Random Walks , 2004, J. Graph Algorithms Appl..

[11]  Jianbo Shi,et al.  A Random Walks View of Spectral Segmentation , 2001, AISTATS.

[12]  Inderjit S. Dhillon,et al.  Overlapping community detection using seed set expansion , 2013, CIKM.

[13]  Jon M. Kleinberg,et al.  Community membership identification from small seed sets , 2014, KDD.

[14]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[15]  Nisheeth K. Vishnoi,et al.  A local spectral method for graphs: with applications to improving graph partitions and exploring data graphs locally , 2009, J. Mach. Learn. Res..

[16]  Santo Fortunato,et al.  Finding Statistically Significant Communities in Networks , 2010, PloS one.

[17]  Fergal Reid,et al.  Detecting highly overlapping community structure by greedy clique expansion , 2010, KDD 2010.

[18]  Kevin J. Lang,et al.  Communities from seed sets , 2006, WWW '06.

[19]  Shang-Hua Teng,et al.  Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems , 2003, STOC '04.

[20]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[21]  Bruce A. Reed,et al.  A Critical Point for Random Graphs with a Given Degree Sequence , 1995, Random Struct. Algorithms.

[22]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[23]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, KDD 2012.

[24]  Dino Pedreschi,et al.  DEMON: a local-first discovery method for overlapping communities , 2012, KDD.

[25]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  Dayou Liu,et al.  A Markov random walk under constraint for discovering overlapping communities in complex networks , 2011, ArXiv.

[27]  Guido Caldarelli,et al.  Topologically biased random walk and community finding in networks. , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.