A Statistical Performance Analysis of Graph Clustering Algorithms

Measuring graph clustering quality remains an open problem. Here, we introduce three statistical measures to address the problem. We empirically explore their behavior under a number of stress test scenarios and compare it to the commonly used modularity and conductance. Our measures are robust, immune to resolution limit, easy to intuitively interpret and also have a formal statistical interpretation. Our empirical stress test results confirm that our measures compare favorably to the established ones. In particular, they are shown to be more responsive to graph structure, less sensitive to sample size and breakdowns during numerical implementation and less sensitive to uncertainty in connectivity. These features are especially important in the context of larger data sets or when the data may contain errors in the connectivity patterns.

[1]  Bhaskar Biswas,et al.  Defining quality metrics for graph clustering evaluation , 2017, Expert Syst. Appl..

[2]  Anne Morvan,et al.  Graph sketching-based Massive Data Clustering , 2017, ArXiv.

[3]  Paweł Prałat,et al.  Modularity of complex networks models , 2017 .

[4]  Shang-Hua Teng,et al.  A Local Clustering Algorithm for Massive Graphs and Its Application to Nearly Linear Time Graph Partitioning , 2008, SIAM J. Comput..

[5]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  Hristo Djidjev,et al.  Using graph partitioning for efficient network modularity optimization , 2012, Graph Partitioning and Graph Clustering.

[7]  Sylvain Peyronnet,et al.  On the Evaluation Potential of Quality Functions in Community Detection for Different Contexts , 2015, NetSci-X.

[8]  Lawrence B. Holder,et al.  Current and Future Challenges in Mining Large Networks: Report on the Second SDM Workshop on Mining Networks and Graphs , 2016, SIGKDD Explor..

[9]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[11]  Mohammed J. Zaki,et al.  Is There a Best Quality Metric for Graph Clusters? , 2011, ECML/PKDD.

[12]  Anne Morvan,et al.  Graph sketching-based Space-efficient Data Clustering , 2017, SDM.

[13]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  Ulrik Brandes,et al.  On Modularity Clustering , 2008, IEEE Transactions on Knowledge and Data Engineering.

[15]  Liudmila Ostroumova,et al.  Modularity in several random graph models , 2017, Electron. Notes Discret. Math..

[16]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[17]  Andrew B. Nobel,et al.  Significance Testing in Clustering , 2015 .

[18]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[19]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[20]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, KDD 2012.

[21]  Reinhard Schneider,et al.  Which clustering algorithm is better for predicting protein complexes? , 2011, BMC Research Notes.

[22]  S. Bornholdt,et al.  When are networks truly modular , 2006, cond-mat/0606220.

[23]  Pierre Hansen,et al.  Modularity maximization in networks by variable neighborhood search , 2011, Graph Partitioning and Graph Clustering.

[24]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[25]  Peter Sanders,et al.  High quality graph partitioning , 2012, Graph Partitioning and Graph Clustering.