Defining and evaluating network communities based on ground-truth

Nodes in real-world networks, such as social, information or technological networks, organize into communities where edges appear with high concentration among the members of the community. Identifying communities in networks has proven to be a challenging task mainly due to a plethora of definitions of a community, intractability of algorithms, issues with evaluation and the lack of a reliable gold-standard ground-truth. We study a set of 230 large social, collaboration and information networks where nodes explicitly define group memberships. We use these groups to define the notion of ground-truth communities. We then propose a methodology which allows us to compare and quantitatively evaluate different definitions of network communities on a large scale. We choose 13 commonly used definitions of network communities and examine their quality, sensitivity and robustness. We show that the 13 definitions naturally group into four classes. We find that two of these definitions, Conductance and Triad-participation-ratio, consistently give the best performance in identifying ground-truth communities.

[1]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[2]  Inderjit S. Dhillon,et al.  Weighted Graph Cuts without Eigenvectors A Multilevel Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[4]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[5]  Jure Leskovec,et al.  Structure and Overlaps of Communities in Networks , 2012, KDD 2012.

[6]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[7]  Marina Meila,et al.  Comparing clusterings: an axiomatic view , 2005, ICML.

[8]  M. McPherson An Ecology of Affiliation , 1983 .

[9]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[11]  Claudio Castellano,et al.  Defining and identifying communities in networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Shang-Hua Teng,et al.  Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems , 2003, STOC '04.

[13]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[14]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[15]  S. Feld The Focused Organization of Social Ties , 1981, American Journal of Sociology.

[16]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[17]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[18]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[19]  Jure Leskovec,et al.  Community-Affiliation Graph Model for Overlapping Network Community Detection , 2012, 2012 IEEE 12th International Conference on Data Mining.

[20]  Mark S. Granovetter The Strength of Weak Ties , 1973, American Journal of Sociology.

[21]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  R. Guimerà,et al.  Modularity from fluctuations in random graphs and complex networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  John E. Hopcroft,et al.  On the separability of structural classes of communities , 2012, KDD.

[24]  Jure Leskovec,et al.  The life and death of online groups: predicting group growth and longevity , 2012, WSDM '12.

[25]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[26]  Yizhou Sun,et al.  Ranking-based clustering of heterogeneous information networks with star network schema , 2009, KDD.

[27]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[28]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[29]  Philip S. Yu,et al.  On selection of objective functions in multi-objective community detection , 2011, CIKM '11.

[30]  Kevin J. Lang,et al.  Communities from seed sets , 2006, WWW '06.

[31]  David F. Gleich,et al.  Neighborhoods are good communities , 2011, ArXiv.

[32]  Philip S. Yu,et al.  Community detection in incomplete information networks , 2012, WWW.

[33]  Boleslaw K. Szymanski,et al.  Overlapping community detection in networks: The state-of-the-art and comparative study , 2011, CSUR.

[34]  Jure Leskovec,et al.  The dynamics of viral marketing , 2005, EC '06.

[35]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[36]  S. Kiesler,et al.  Applying Common Identity and Bond Theory to Design of Online Communities , 2007 .

[37]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Ulrike von Luxburg,et al.  Clustering Stability: An Overview , 2010, Found. Trends Mach. Learn..

[39]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[40]  Leon Danon,et al.  Comparing community structure identification , 2005, cond-mat/0505245.

[41]  Satu Elisa Schaeffer,et al.  Graph Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.

[42]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.