Metrics for Community Analysis: A Survey

Detecting and analyzing dense groups or communities from social and information networks has attracted immense attention over last one decade due to its enormous applicability in different domains. Community detection is an ill-defined problem, as the nature of the communities is not known in advance. The problem has turned out to be even complicated due to the fact that communities emerge in the network in various forms - disjoint, overlapping, hierarchical etc. Various heuristics have been proposed depending upon the application in hand. All these heuristics have been materialized in the form of new metrics, which in most cases are used as optimization functions for detecting the community structure, or provide an indication of the goodness of detected communities during evaluation. There arises a need for an organized and detailed survey of the metrics proposed with respect to community detection and evaluation. In this survey, we present a comprehensive and structured overview of the start-of-the-art metrics used for the detection and the evaluation of community structure. We also conduct experiments on synthetic and real-world networks to present a comparative analysis of these metrics in measuring the goodness of the underlying community structure.

[1]  Tsuyoshi Murata A New Tripartite Modularity for Detecting Communities , 2011 .

[2]  Yasushi Kawase,et al.  Z-Score-Based Modularity for Community Detection in Networks , 2015, PloS one.

[3]  Steve Gregory,et al.  Finding overlapping communities in networks by label propagation , 2009, ArXiv.

[4]  Mao-Bin Hu,et al.  Detect overlapping and hierarchical community structure in networks , 2008, ArXiv.

[5]  Matthieu Latapy,et al.  Expected Nodes: A Quality Function for the Detection of Link Communities , 2015, CompleNet.

[6]  David S. Johnson,et al.  Some simplified NP-complete problems , 1974, STOC '74.

[7]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[8]  M. Newman Analysis of weighted networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Martin Rosvall,et al.  An information-theoretic framework for resolving community structure in complex networks , 2007, Proceedings of the National Academy of Sciences.

[11]  Andrea Lancichinetti,et al.  Detecting the overlapping and hierarchical community structure in complex networks , 2008, 0802.1218.

[12]  Eyke Hüllermeier,et al.  A Fuzzy Variant of the Rand Index for Comparing Clustering Structures , 2009, IFSA/EUSFLAT Conf..

[13]  M. Barber Modularity and community detection in bipartite networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  Alex Arenas,et al.  Analysis of the structure of complex networks at different resolution levels , 2007, physics/0703218.

[15]  Kevin J. Lang,et al.  Finding dense and isolated submarkets in a sponsored search spending graph , 2007, CIKM '07.

[16]  Alexander Kraskov,et al.  Hierarchical Clustering Based on Mutual Information , 2003, ArXiv.

[17]  Roger Guimerà,et al.  Module identification in bipartite and directed networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Charalampos E. Tsourakakis,et al.  Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees , 2013, KDD.

[20]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[21]  L. Collins,et al.  Omega: A General Formulation of the Rand Index of Cluster Recovery Suitable for Non-disjoint Solutions. , 1988, Multivariate behavioral research.

[22]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Ignacio Marín,et al.  Deciphering Network Community Structure by Surprise , 2011, PloS one.

[24]  Ling Chen,et al.  Anti-modularity and anti-community detecting in complex networks , 2014, Inf. Sci..

[25]  Boleslaw K. Szymanski,et al.  Overlapping community detection in networks: The state-of-the-art and comparative study , 2011, CSUR.

[26]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  James Bailey,et al.  Information theoretic measures for clusterings comparison: is a correction for chance necessary? , 2009, ICML '09.

[28]  Yang Yang,et al.  Maximizing modularity intensity for community partition and evolution , 2013, Inf. Sci..

[29]  E A Leicht,et al.  Mixture models and exploratory analysis in networks , 2006, Proceedings of the National Academy of Sciences.

[30]  Yu Jun,et al.  A New Definition of Modularity for Community Detection in Complex Networks , 2012 .

[31]  Henrik Jeldtoft Jensen,et al.  Comparison of Communities Detection Algorithms for Multiplex , 2014, ArXiv.

[32]  Vincent Labatut,et al.  Generalised measures for the evaluation of community detection methods , 2013, Int. J. Soc. Netw. Min..

[33]  Qinna Wang,et al.  Fuzziness and Overlapping Communities in Large-Scale Networks , 2012, J. Univers. Comput. Sci..

[34]  Boleslaw K. Szymanski,et al.  Extension of Modularity Density for overlapping community structure , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[35]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[36]  Sylvain Peyronnet,et al.  On the Evaluation Potential of Quality Functions in Community Detection for Different Contexts , 2015, NetSci-X.

[37]  Vincent A. Traag,et al.  Significant Scales in Community Structure , 2013, Scientific Reports.

[38]  C. Mallows,et al.  A Method for Comparing Two Hierarchical Clusterings , 1983 .

[39]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[40]  R. Guimerà,et al.  Modularity from fluctuations in random graphs and complex networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[41]  M. Meilă Comparing clusterings---an information based distance , 2007 .

[42]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[43]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[44]  S. Bornholdt,et al.  When are networks truly modular , 2006, cond-mat/0606220.

[45]  Camelia Chira,et al.  Evolutionary detection of community structures in complex networks: A new fitness function , 2012, 2012 IEEE Congress on Evolutionary Computation.

[46]  Fabrício Olivetti de França,et al.  A Flexible Fitness Function for Community Detection in Complex Networks , 2014, CompleNet.

[47]  M. Jalili,et al.  Community Detection in Signed Networks: the Role of Negative ties in Different Scales , 2015, Scientific Reports.

[48]  Ricardo J. G. B. Campello,et al.  A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment , 2007, Pattern Recognit. Lett..

[49]  Erik M Bollt,et al.  Local method for detecting communities. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[50]  Steve Gregory,et al.  Fuzzy overlapping communities in networks , 2010, ArXiv.

[51]  Panos M. Pardalos,et al.  Handbook of Optimization in Complex Networks: Theory and Applications , 2014 .

[52]  Julio Gonzalo,et al.  The SemEval-2007 WePS Evaluation: Establishing a benchmark for the Web People Search Task , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[53]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[54]  Mark Newman,et al.  Detecting community structure in networks , 2004 .

[55]  Agata Fronczak,et al.  Average path length in random networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[56]  James P. Bagrow Evaluating local community methods in networks , 2007, 0706.3880.

[57]  Tsuyoshi Murata,et al.  Extending modularity by capturing the similarity attraction feature in the null model , 2012, 1210.4007.

[58]  Kristina Lerman,et al.  Community Detection Using a Measure of Global Influence , 2008, SNAKDD.

[59]  Jean-Loup Guillaume,et al.  Static community detection algorithms for evolving networks , 2010, 8th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks.

[60]  Benjamin H. Good,et al.  Performance of modularity maximization in practical contexts. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[61]  Avishek Banerjee,et al.  STATISTICAL ANALYSIS OF THE INDIAN RAILWAY NETWORK: A COMPLEX NETWORK APPROACH , 2011 .

[62]  Ken Wakita,et al.  Extracting Multi-facet Community Structure from Bipartite Networks , 2009, 2009 International Conference on Computational Science and Engineering.

[63]  Jing Li,et al.  Robust Local Community Detection: On Free Rider Effect and Its Elimination , 2015, Proc. VLDB Endow..

[64]  Huawei Shen,et al.  Quantifying and identifying the overlapping community structure in networks , 2009, 0905.2666.

[65]  Duanbing Chen,et al.  Detecting overlapping communities of weighted networks via a local algorithm , 2010 .

[66]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[67]  A. Clauset Finding local community structure in networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[68]  E A Leicht,et al.  Community structure in directed networks. , 2007, Physical review letters.

[69]  Steve Harenberg,et al.  Community detection in large‐scale networks: a survey and empirical evaluation , 2014 .

[70]  Randy Goebel,et al.  Local Community Identification in Social Networks , 2009, 2009 International Conference on Advances in Social Network Analysis and Mining.

[71]  Ignacio Marín,et al.  SurpriseMe: an integrated tool for network community structure characterization using Surprise maximization , 2013, Bioinform..

[72]  Tsuyoshi Murata,et al.  Detecting Communities from Bipartite Networks Based on Bipartite Modularities , 2009, 2009 International Conference on Computational Science and Engineering.

[73]  Malik Magdon-Ismail,et al.  Finding communities by clustering a graph into overlapping subgraphs , 2005, IADIS AC.

[74]  Giuseppe Carenini,et al.  Using the Omega Index for Evaluating Abstractive Community Detection , 2012, EvalMetrics@NAACL-HLT.

[75]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[76]  Leon Danon,et al.  Comparing community structure identification , 2005, cond-mat/0505245.

[77]  Clara Pizzuti,et al.  Overlapped community detection in complex networks , 2009, GECCO.

[78]  Fosca Giannotti,et al.  Finding and Characterizing Communities in Multidimensional Networks , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[79]  Ricardo J. G. B. Campello,et al.  Generalized external indexes for comparing data partitions with overlapping categories , 2010, Pattern Recognit. Lett..

[80]  Nitesh V. Chawla,et al.  Identifying and evaluating community structure in complex networks , 2010, Pattern Recognit. Lett..

[81]  Alexander Struck,et al.  Identification of overlapping communities and their hierarchy by locally calculating community-changing resolution levels , 2010, ArXiv.

[82]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[83]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[84]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[85]  Tam'as Vicsek,et al.  Modularity measure of networks with overlapping communities , 2009, 0910.5072.

[86]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[87]  David F. Gleich,et al.  Heat kernel based community detection , 2014, KDD.

[88]  Ignacio Marín,et al.  Surprise maximization reveals the community structure of complex networks , 2013, Scientific Reports.

[89]  Jari Saramäki,et al.  Characterizing the Community Structure of Complex Networks , 2010, PloS one.

[90]  Samir Khuller,et al.  Dense Subgraphs with Restrictions and Applications to Gene Annotation Graphs , 2010, RECOMB.

[91]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[92]  Hocine Cherifi,et al.  Comparative evaluation of community detection algorithms: a topological approach , 2012, ArXiv.

[93]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[94]  Hongyu Zhao,et al.  Community identification in networks with unbalanced structure. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[95]  Maria-Florina Balcan,et al.  Modeling and Detecting Community Hierarchies , 2013, SIMBAD.

[96]  Neil J. Hurley,et al.  Detecting Highly Overlapping Communities with Model-Based Overlapping Seed Expansion , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[97]  Yan Zhang,et al.  Multi-resolution community detection based on generalized self-loop rescaling strategy , 2014, Physica A: Statistical Mechanics and its Applications.

[98]  F. Rao,et al.  Local modularity measure for network clusterizations. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[99]  A. Arenas,et al.  Motif-based communities in complex networks , 2007, 0710.0059.

[100]  Clara Pizzuti,et al.  GA-Net: A Genetic Algorithm for Community Detection in Social Networks , 2008, PPSN.

[101]  J. A. Rodríguez-Velázquez,et al.  Spectral measures of bipartivity in complex networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[102]  Claudio Castellano,et al.  Defining and identifying communities in networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[103]  Sylvain Peyronnet,et al.  Finding compact communities in large graphs , 2014, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[104]  Niloy Ganguly,et al.  Citation interactions among computer science fields: a quantitative route to the rise and fall of scientific research , 2014, Social Network Analysis and Mining.

[105]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[106]  J. Doye,et al.  Identifying communities within energy landscapes. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[107]  Stephen Kelley The existence and discovery of overlapping communities in large-scale networks , 2009 .

[108]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[109]  Tsuyoshi Murata Detecting communities from tripartite networks , 2010, WWW '10.

[110]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[111]  Santo Fortunato,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[112]  Boleslaw K. Szymanski,et al.  Towards Linear Time Overlapping Community Detection in Social Networks , 2012, PAKDD.

[113]  Luonan Chen,et al.  Quantitative function for community detection. , 2008 .

[114]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[115]  Sergio Gómez,et al.  Size reduction of complex networks preserving modularity , 2007, ArXiv.

[116]  Feng Luo,et al.  Exploring Local Community Structures in Large Networks , 2006, Web Intelligence.

[117]  Santo Fortunato,et al.  Limits of modularity maximization in community detection , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[118]  Michalis Vazirgiannis,et al.  On Clustering Validation Techniques , 2001, Journal of Intelligent Information Systems.

[119]  Elena Marchiori,et al.  Axioms for graph clustering quality functions , 2013, J. Mach. Learn. Res..

[120]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[121]  Derek Greene,et al.  Normalized Mutual Information to evaluate overlapping community finding algorithms , 2011, ArXiv.

[122]  Vincent A. Traag,et al.  Detecting communities using asymptotical Surprise , 2015, Physical review. E, Statistical, nonlinear, and soft matter physics.

[123]  Boleslaw K. Szymanski,et al.  On Measuring the Quality of a Network Community Structure , 2013, 2013 International Conference on Social Computing.

[124]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[125]  Randy Goebel,et al.  Detecting Communities in Social Networks Using Max-Min Modularity , 2009, SDM.

[126]  Niloy Ganguly,et al.  Computer science fields as ground-truth communities: Their impact, rise and fall , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[127]  Yasushi Kawase,et al.  What Is a Network Community?: A Novel Quality Function and Detection Algorithms , 2015, CIKM.

[128]  Przemyslaw Kazienko,et al.  Predicting Group Evolution in the Social Network , 2012, SocInfo.

[129]  Dorothea Wagner,et al.  Graph Clustering with Surprise: Complexity and Exact Solutions , 2013, SOFSEM.

[130]  Inderjit S. Dhillon,et al.  Overlapping community detection using seed set expansion , 2013, CIKM.

[131]  Andrea Lancichinetti,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[132]  T. Nepusz,et al.  Fuzzy communities and the concept of bridgeness in complex networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[133]  Shihua Zhang,et al.  Identification of overlapping community structure in complex networks using fuzzy c-means clustering , 2007 .

[134]  Carlo Zaniolo,et al.  Max-Intensity: Detecting Competitive Advertiser Communities in Sponsored Search Market , 2015, 2015 IEEE International Conference on Data Mining.

[135]  J. Reichardt,et al.  Statistical mechanics of community detection. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[136]  Bin Li,et al.  Density-based modularity for evaluating community structure in bipartite networks , 2015, Inf. Sci..

[137]  Youngdo Kim,et al.  Finding communities in directed networks. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[138]  Xiaowei Xu,et al.  A Novel Similarity-Based Modularity Function for Graph Partitioning , 2007, DaWaK.

[139]  Sanjukta Bhowmick,et al.  On the permanence of vertices in network communities , 2014, KDD.

[140]  Huan Liu,et al.  Uncoverning Groups via Heterogeneous Interaction Analysis , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[141]  Santo Fortunato,et al.  Finding Statistically Significant Communities in Networks , 2010, PloS one.