Towards a Better Quality Metric for Graph Cluster Evaluation

The process of discovering groups of similar vertices in a graph, known as graph clustering, has interesting applications in many dierent scenarios, such as marketing and recommendation systems. One of the most important aspects of graph clustering is the evaluation of cluster quality, which is important not only to measure the eectiveness of clustering algorithms, but also to give insights on the dynamics of relationships in a given network. Many quality metrics for graph clustering evaluation exist, but the most popular ones have strong biases and structural inconsistencies that cause the quality of their results to be, at least, doubtful. Our studies showed that, while in general those popular quality metrics do a good job evaluating the external sparsity between clusters, they do poorly when evaluating the internal density of those clusters, ignoring essential information (such as a cluster's vertex count) or having its internal density component ignored in practice because of its computational cost. In this article, we propose a new method for evaluating the internal density of a given cluster, one that not only uses more complete information to evaluate that density, but also takes into consideration structural characteristics of the original graph. With our proposed method, the internal density of a cluster is evaluated in terms of the expected density of similar clusters in that same graph, in contrast to the traditional quality metrics available, where clusters from dierent graphs are compared by the same standards. We believe that, if used in conjunction with a good external sparsity evaluation metric, like conductance, this method will help to obtain better, more signicant graph clustering evaluation results.

[1]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[2]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[3]  Guy Melançon,et al.  Evaluating the Quality of Clustering Algorithms Using Cluster Path Lengths , 2010, ICDM.

[4]  M. Newman,et al.  Mixing patterns in networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  Mika Gustafsson,et al.  Comparison and validation of community structures in complex networks , 2006 .

[6]  M. Newman,et al.  Mixing Patterns and Community Structure in Networks , 2002, cond-mat/0210146.

[7]  Leon Danon,et al.  Comparing community structure identification , 2005, cond-mat/0505245.

[8]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[9]  Vahab S. Mirrokni,et al.  Large-Scale Community Detection on YouTube for Topic Discovery and Exploration , 2011, ICWSM.

[10]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[11]  Sharon L. Lohr,et al.  Sampling: Design and Analysis , 1999 .

[12]  Donald F. Towsley,et al.  Estimating and sampling graphs with multidimensional random walks , 2010, IMC '10.

[13]  Srinivasan Parthasarathy,et al.  Scalable graph clustering using stochastic flows: applications to community discovery , 2009, KDD.

[14]  Bin Wu,et al.  Community detection in large-scale social networks , 2007, WebKDD/SNA-KDD '07.

[15]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[16]  Charu C. Aggarwal,et al.  Graph Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[17]  Ulrik Brandes,et al.  Experiments on Graph Clustering Algorithms , 2003, ESA.

[18]  Stijn van Dongen,et al.  Graph Clustering Via a Discrete Uncoupling Process , 2008, SIAM J. Matrix Anal. Appl..

[19]  Charu C. Aggarwal,et al.  On clustering heterogeneous social media objects with outlier links , 2012, WSDM '12.

[20]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21]  Donald E. Knuth,et al.  The Stanford GraphBase - a platform for combinatorial computing , 1993 .

[22]  Ulrik Brandes,et al.  Engineering graph clustering: Models and experimental evaluation , 2008, JEAL.

[23]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[24]  Mohammed J. Zaki,et al.  Is There a Best Quality Metric for Graph Clusters? , 2011, ECML/PKDD.

[25]  D. Bu,et al.  Topological structure analysis of the protein-protein interaction network in budding yeast. , 2003, Nucleic acids research.

[26]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[27]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[28]  S. vanDongen Graph Clustering by Flow Simulation , 2000 .

[29]  Cristopher Moore,et al.  Accuracy and scaling phenomena in Internet mapping. , 2004, Physical review letters.

[30]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.