Is There a Best Quality Metric for Graph Clusters?

Graph clustering, the process of discovering groups of similar vertices in a graph, is a very interesting area of study, with applications in many different scenarios. One of the most important aspects of graph clustering is the evaluation of cluster quality, which is important not only to measure the effectiveness of clustering algorithms, but also to give insights on the dynamics of relationships in a given network. Many quality evaluation metrics for graph clustering have been proposed in the literature, but there is no consensus on how do they compare to each other and how well they perform on different kinds of graphs. In this work we study five major graph clustering quality metrics in terms of their formal biases and their behavior when applied to clusters found by four implementations of classic graph clustering algorithms on five large, real world graphs. Our results show that those popular quality metrics have strong biases toward incorrectly awarding good scores to some kinds of clusters, especially seen in larger networks. They also indicate that currently used clustering algorithms and quality metrics do not behave as expected when cluster structures are different from the more traditional, clique-like ones.

[1]  Ulrik Brandes,et al.  Engineering graph clustering: Models and experimental evaluation , 2008, JEAL.

[2]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[3]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[4]  T. Nepusz,et al.  Likelihood-based Clustering of Directed Graphs , 2007, 2007 International Symposium on Computational Intelligence and Intelligent Informatics.

[5]  Tamás Nepusz,et al.  SCPS: a fast implementation of a spectral method for detecting protein families on a genome-wide scale , 2010, BMC Bioinformatics.

[6]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[7]  M. Newman,et al.  Mixing patterns in networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  Inderjit S. Dhillon,et al.  Weighted Graph Cuts without Eigenvectors A Multilevel Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Benjamin H. Good,et al.  Performance of modularity maximization in practical contexts. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  M. Newman,et al.  Mixing Patterns and Community Structure in Networks , 2002, cond-mat/0210146.

[11]  Leon Danon,et al.  Comparing community structure identification , 2005, cond-mat/0505245.

[12]  Satu Elisa Schaeffer,et al.  Graph Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.

[13]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[16]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[17]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[18]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[20]  Mika Gustafsson,et al.  Comparison and validation of community structures in complex networks , 2006 .

[21]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[22]  Charu C. Aggarwal,et al.  Graph Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[23]  Stijn van Dongen,et al.  Graph Clustering Via a Discrete Uncoupling Process , 2008, SIAM J. Matrix Anal. Appl..

[24]  S. vanDongen Graph Clustering by Flow Simulation , 2000 .

[25]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[26]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.