FOR CLOSENESS : ADJUSTING NORMALIZED MUTUAL INFORMATION MEASURE FOR CLUSTERING COMPARISON

Normalized mutual information (NMI) is a widely used measure to compare community detection methods. Recently, however, the need of adjustment for information theory‐based measures has been argued because of the so‐called selection bias problem, that is, they show the tendency in choosing clustering solutions with more communities. In this article, an experimental evaluation of these measures is performed to deeply investigate the problem, and an adjustment that scales the values of these measures is proposed. Experiments on synthetic networks, for which the ground‐truth division is known, highlight that scaled NMI does not present the selection bias behavior. Moreover, a comparison among some well‐known community detection methods on synthetic generated networks shows a fairer behavior of scaled NMI, especially when the network topology does not present a clear community structure. The experimentation also on two real‐world networks reveals that the corrected formula allows to choose, among a set, the method finding a network division that better reflects the ground‐truth structure.

[1]  Martin Rosvall,et al.  An information-theoretic framework for resolving community structure in complex networks , 2007, Proceedings of the National Academy of Sciences.

[2]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[4]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  Günce Keziban Orman,et al.  The Effect of Network Realism on Community Detection Algorithms , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[6]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[7]  Jure Leskovec,et al.  Defining and Evaluating Network Communities Based on Ground-Truth , 2012, ICDM.

[8]  Tarald O. Kvålseth,et al.  Entropy and Correlation: Some Comments , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[9]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[10]  Leon Danon,et al.  Comparing community structure identification , 2005, cond-mat/0505245.

[11]  Andrea Lancichinetti,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[12]  Michalis Vazirgiannis,et al.  Cluster validity methods: part I , 2002, SGMD.

[13]  James Bailey,et al.  Information theoretic measures for clusterings comparison: is a correction for chance necessary? , 2009, ICML '09.

[14]  Hocine Cherifi,et al.  Qualitative Comparison of Community Detection Algorithms , 2011, DICTAP.

[15]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[16]  Hocine Cherifi,et al.  Comparative evaluation of community detection algorithms: a topological approach , 2012, ArXiv.

[17]  Derek Greene,et al.  Normalized Mutual Information to evaluate overlapping community finding algorithms , 2011, ArXiv.

[18]  L. Hubert,et al.  Comparing partitions , 1985 .

[19]  Guido Caldarelli,et al.  Hierarchical mutual information for the comparison of hierarchical community structures in complex networks , 2015, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  James Bailey,et al.  Standardized Mutual Information for Clustering Comparisons: One Step Further in Adjustment for Chance , 2014, ICML.

[21]  Ricardo J. G. B. Campello,et al.  Communities validity: methodical evaluation of community mining algorithms , 2013, Social Network Analysis and Mining.

[22]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[23]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[24]  Vincent Labatut,et al.  Generalised measures for the evaluation of community detection methods , 2013, Int. J. Soc. Netw. Min..

[25]  Yiyu Yao,et al.  Information-Theoretic Measures for Knowledge Discovery and Data Mining , 2003 .

[26]  S. vanDongen Performance criteria for graph clustering and Markov cluster experiments , 2000 .

[27]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[28]  Matthieu Latapy,et al.  Computing Communities in Large Networks Using Random Walks , 2004, J. Graph Algorithms Appl..

[29]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[30]  Isabelle Guyon,et al.  A Stability Based Method for Discovering Structure in Clustered Data , 2001, Pacific Symposium on Biocomputing.

[31]  Y. Yao,et al.  Information-Theoretic Measures for Knowledge Discovery and Data Mining , 2003 .

[32]  Osmar R. Zaïane,et al.  Generalization of clustering agreements and distances for overlapping clusters and network communities , 2014, Data Mining and Knowledge Discovery.

[33]  Hocine Cherifi,et al.  Towards realistic artificial benchmark for community detection algorithms evaluation , 2013, Int. J. Web Based Communities.

[34]  Marina Meila,et al.  An Experimental Comparison of Model-Based Clustering Methods , 2004, Machine Learning.

[35]  M. Meilă Comparing clusterings---an information based distance , 2007 .

[36]  Stijn van Dongen,et al.  Graph Clustering Via a Discrete Uncoupling Process , 2008, SIAM J. Matrix Anal. Appl..