Evaluating accuracy of community detection using the relative normalized mutual information

Normalized Mutual Information (NMI) has been widely used to evaluate accuracy of community detection algorithms. In this notes we show that NMI is seriously affected by systematic error due to finite size of networks, and may give wrong estimate of performance of algorithms in some cases. A simple expression for the estimate of this error is derived and tested numerically. We suggest to use a new measure to accuracy of community detection, namely relative Normalized Mutual Information (rNMI), which is NMI minus the expected NMI of random partitions. This measure is very close to zero for two random partitions even with a short length, so it can overcome the problem of NMI.

[1]  Andrea Lancichinetti,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[2]  George Loizou,et al.  Computer vision and pattern recognition , 2007, Int. J. Comput. Math..

[3]  Elchanan Mossel,et al.  Spectral redemption in clustering sparse networks , 2013, Proceedings of the National Academy of Sciences.

[4]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[6]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[7]  Carsten O. Daub,et al.  The mutual information: Detecting and evaluating dependencies between variables , 2002, ECCB.

[8]  David H. Wolpert,et al.  Estimating Functions of Distributions Defined over Spaces of Unknown Size , 2013, Entropy.

[9]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[11]  W. Ebeling,et al.  Finite sample effects in sequence analysis , 1994 .

[12]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[14]  O. Bagasra,et al.  Proceedings of the National Academy of Sciences , 1914, Science.

[15]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[16]  J. Rogers Chaos , 1876 .

[17]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[18]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  Cristopher Moore,et al.  Scalable detection of statistically significant communities and hierarchies, using message passing for modularity , 2014, Proceedings of the National Academy of Sciences.

[20]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[21]  J. Herskowitz,et al.  Proceedings of the National Academy of Sciences, USA , 1996, Current Biology.