A weighted local view method based on observation over ground truth for community detection

Community detection is a fundamental problem for many networks, and there have been a lot of methods proposed to discover communities. However, due to the rapid increase of the scale and diversity of networks, the modular organization at the global level in many large networks is often extremely difficult to recognize. In these cases, many existing methods fail to discover the latent community structure, because they follow a paradigm of discovering communities from a global view of networks. In this paper, we propose a weighted local view method based on an interesting observation on ground-truth communities, with the aim of revealing community structure in large real networks. This is achieved by the following steps: 1) a set of nodes which can well represent their neighboring nodes are chosen by local seeding strategies; 2) each chosen node explores the community in its local view to the whole network, using an improved approximate personalized PageRank-based community finder which is based on an interesting observation on large real networks with ground-truth communities; 3) all explored local communities are merged to form the global community structure. We evaluate the weighted local view method against the state-of-the-art community detection methods on large real networks with ground-truth communities. Experiments show that the proposed method can not only improve the detected communities, but can also scale to very large networks with good computational efficiency compared with other methods, which indicates that the weighted local view method has great potential for overlapping community detection in large networks.

[1]  Hao Huang,et al.  Toward seed-insensitive solutions to local community detection , 2014, Journal of Intelligent Information Systems.

[2]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  Qiong Chen,et al.  Detecting local community structures in complex networks based on local degree central nodes , 2013 .

[4]  M E J Newman Assortative mixing in networks. , 2002, Physical review letters.

[5]  Steve Gregory,et al.  Finding overlapping communities in networks by label propagation , 2009, ArXiv.

[6]  Aline Gangnery,et al.  Towards the Determination of Mytilus edulis Food Preferences Using the Dynamic Energy Budget (DEB) Theory , 2014, PloS one.

[7]  Haibo Hu,et al.  Disassortative mixing in online social networks , 2009, 0909.0450.

[8]  Jianbin Huang,et al.  Towards Online Multiresolution Community Detection in Large-Scale Networks , 2011, PloS one.

[9]  Feng Luo,et al.  Exploring Local Community Structures in Large Networks , 2006, Web Intelligence.

[10]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Steve Gregory,et al.  An Algorithm to Find Overlapping Community Structure in Networks , 2007, PKDD.

[12]  Yiannis Kompatsiaris,et al.  Community detection in Social Media , 2012, Data Mining and Knowledge Discovery.

[13]  Yun Chi,et al.  Combining link and content for community detection: a discriminative approach , 2009, KDD.

[14]  Dino Pedreschi,et al.  A classification for community discovery methods in complex networks , 2011, Stat. Anal. Data Min..

[15]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[16]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Dino Pedreschi,et al.  DEMON: a local-first discovery method for overlapping communities , 2012, KDD.

[19]  M. Newman,et al.  Mixing patterns in networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  Inderjit S. Dhillon,et al.  Overlapping community detection using seed set expansion , 2013, CIKM.

[21]  Andrea Lancichinetti,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[22]  Aristides Gionis,et al.  Overlapping community detection in labeled graphs , 2014, Data Mining and Knowledge Discovery.

[23]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[24]  Bo Yang,et al.  Enhanced link clustering with observations on ground truth to discover social circles , 2015, Knowl. Based Syst..

[25]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[26]  M. E. J. Newman,et al.  Power laws, Pareto distributions and Zipf's law , 2005 .

[27]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[28]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[29]  Michal Laclavik,et al.  On community detection in real-world networks and the importance of degree assortativity , 2013, KDD.

[30]  Robin I. M. Dunbar Social cognition on the Internet: testing constraints on social network size , 2012, Philosophical Transactions of the Royal Society B: Biological Sciences.

[31]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[32]  Andrea Lancichinetti,et al.  Detecting the overlapping and hierarchical community structure in complex networks , 2008, 0802.1218.

[33]  Robin I. M. Dunbar Neocortex size as a constraint on group size in primates , 1992 .

[34]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[35]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[36]  Santo Fortunato,et al.  Finding Statistically Significant Communities in Networks , 2010, PloS one.

[37]  David F. Gleich,et al.  Vertex neighborhoods, low conductance cuts, and good seeds for local community methods , 2012, KDD.

[38]  Charalampos E. Tsourakakis Fast Counting of Triangles in Large Real Networks without Counting: Algorithms and Laws , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[39]  Johan A. K. Suykens,et al.  Multilevel Hierarchical Kernel Spectral Clustering for Real-Life Large Scale Complex Networks , 2014, PloS one.

[40]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[41]  James P. Bagrow Evaluating local community methods in networks , 2007, 0706.3880.

[42]  M. Hasler,et al.  Network community-detection enhancement by proper weighting. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[43]  Boleslaw K. Szymanski,et al.  Overlapping community detection in networks: The state-of-the-art and comparative study , 2011, CSUR.

[44]  A. Clauset Finding local community structure in networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[45]  Tomas Olovsson,et al.  A local seed selection algorithm for overlapping community detection , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).