A New Information-Theoretical Distance Measure for Evaluating Community Detection Algorithms

Community detection is a research area from network science dealing withthe investigation of complex networks such as social or biological networks, aimingto identify subgroups (communities) of entities (nodes) thatare more closely relatedto each other inside the community than with the remaining entities in the network.Various community detection algorithms have been developed and used in the literaturehowever evaluating community structures that have been automatically detected isa challenging task due to varying results in different scenarios.Current evaluationmeasures that compare extracted community structures with the reference structure orground truth suffer from various drawbacks; some of them having beenpoint out in theliterature. Information theoretic measures form a fundamental classin this domain andhave recently received increasing interest. However even the well employed measures(NVI and NID) also share some limitations, particularly they are biased toward thenumber of communities in the network. The main contribution ofthis paper is tointroduce a new measure that overcomes this limitation while holding the importantproperties of measures. We review the mathematical properties of our measure based on?2divergence inspired fromf-divergence measures in information theory. Theoreticalproperties as well as experimental results in various scenarios show the superiority of theproposed measure to evaluate community detection over the ones from the literature.

[1]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[2]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[3]  Tarald O. Kvålseth,et al.  Entropy and Correlation: Some Comments , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[4]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[5]  Y. Yao,et al.  Information-Theoretic Measures for Knowledge Discovery and Data Mining , 2003 .

[6]  Flemming Topsøe,et al.  Some inequalities for information divergence and related measures of discrimination , 2000, IEEE Trans. Inf. Theory.

[7]  Alexander Kraskov,et al.  Published under the scientific responsability of the EUROPEAN PHYSICAL SOCIETY Incorporating , 2002 .

[8]  Josiane Mothe,et al.  Community detection: Comparison of state of the art algorithms , 2017, 2017 Computer Science and Information Technologies (CSIT).

[9]  Zhao Yang,et al.  A Comparative Analysis of Community Detection Algorithms on Artificial Networks , 2016, Scientific Reports.

[10]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[11]  James Bailey,et al.  Adjusting for Chance Clustering Comparison Measures , 2015, J. Mach. Learn. Res..

[12]  Clara Pizzuti,et al.  Is normalized mutual information a fair measure for comparing community detection methods? , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[13]  Sergio Verdú,et al.  $f$ -Divergence Inequalities , 2015, IEEE Transactions on Information Theory.

[14]  Imre Csiszár,et al.  Information Theory and Statistics: A Tutorial , 2004, Found. Trends Commun. Inf. Theory.

[15]  T. Morimoto Markov Processes and the H -Theorem , 1963 .

[16]  Igal Sason Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding , 2015, IEEE Transactions on Information Theory.

[17]  Emmanuel Abbe,et al.  Community detection and stochastic block models: recent developments , 2017, Found. Trends Commun. Inf. Theory.

[18]  Silke Wagner,et al.  Comparing Clusterings - An Overview , 2007 .

[19]  Josiane Mothe,et al.  f-Divergence Measures for Evaluation in Community Detection , 2018 .

[20]  M. Meilă Comparing clusterings---an information based distance , 2007 .

[21]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..