Community Detection Algorithm Evaluation using Size and Hashtags

Understanding community structure in social media is critical due to its broad applications such as friend recommendations, link predictions and collaborative filtering. However, there is no widely accepted definition of community in literature. Existing work use structure related metrics such as modularity and function related metrics such as ground truth to measure the performance of community detection algorithms, while ignoring an important metric, size of the community. [1] suggests that the size of community with strong ties in social media should be limited to 150. As we discovered in this paper, the majority of the communities obtained by many popular community detection algorithms are either very small or very large. Too small communities don't have practical value and too large communities contain weak connections therefore not stable. In this paper, we compare various community detection algorithms considering the following metrics: size of the communities, coverage of the communities, extended modularity, triangle participation ratio, and user interest in the same community. We also propose a simple clique based algorithm for community detection as a baseline for the comparison. Experimental results show that both our proposed algorithm and the well-accepted disjoint algorithm InfoMap perform well in all the metrics.

[1]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[2]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  Steve Harenberg,et al.  Community detection in large‐scale networks: a survey and empirical evaluation , 2014 .

[5]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[6]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[7]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  Jure Leskovec,et al.  Defining and Evaluating Network Communities Based on Ground-Truth , 2012, ICDM.

[9]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[12]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[13]  R I M Dunbar,et al.  Do online social media cut through the constraints that limit the size of offline social networks? , 2016, Royal Society Open Science.

[14]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Mao-Bin Hu,et al.  Detect overlapping and hierarchical community structure in networks , 2008, ArXiv.

[16]  Xingyi Zhang,et al.  Overlapping Community Detection based on Network Decomposition , 2016, Scientific Reports.

[17]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[18]  Tam'as Vicsek,et al.  Modularity measure of networks with overlapping communities , 2009, 0910.5072.

[19]  Buzhou Tang,et al.  Overlapping community detection in networks with positive and negative links , 2014 .