A Statistical Density-Based Analysis of Graph Clustering Algorithm Performance

Measuring graph clustering quality remains an open problem. To address it, we introduce quality measures based on comparisons of intra- and inter-cluster densities, an accompanying statistical test of the significance of their differences and a step-by-step routine for clustering quality assessment. Our null hypothesis does not rely on any generative model for the graph, unlike modularity which uses the configuration model as a null model. Our measures are shown to meet the axioms of a good clustering quality function, unlike the very commonly used modularity measure. They also have an intuitive graph-theoretic interpretation, a formal statistical interpretation and can be easily tested for significance. Our work is centered on the idea that well clustered graphs will display a significantly larger intra-cluster density than inter-cluster density. We develop tests to validate the existence of such a cluster structure. We empirically explore the behavior of our measures under a number of stress test scenarios and compare their behavior to the commonly used modularity and conductance measures. Empirical stress test results confirm that our measures compare very favorably to the established ones. In particular, they are shown to be more responsive to graph structure and less sensitive to sample size and breakdowns during numerical implementation and less sensitive to uncertainty in connectivity. These features are especially important in the context of larger data sets or when the data may contain errors in the connectivity patterns.

[1]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[2]  Weili Wu,et al.  Size Matters: A Comparative Analysis of Community Detection Algorithms , 2017, IEEE Transactions on Computational Social Systems.

[3]  Reinhard Schneider,et al.  Which clustering algorithm is better for predicting protein complexes? , 2011, BMC Research Notes.

[4]  Jure Leskovec,et al.  Defining and Evaluating Network Communities Based on Ground-Truth , 2012, ICDM.

[5]  Peter Sanders,et al.  High quality graph partitioning , 2012, Graph Partitioning and Graph Clustering.

[6]  Anne Morvan,et al.  Graph sketching-based Massive Data Clustering , 2017, ArXiv.

[7]  Bhaskar Biswas,et al.  Defining quality metrics for graph clustering evaluation , 2017, Expert Syst. Appl..

[8]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[9]  David A. Bader,et al.  Graph Partitioning and Graph Clustering, 10th DIMACS Implementation Challenge Workshop, Georgia Institute of Technology, Atlanta, GA, USA, February 13-14, 2012. Proceedings , 2013, Graph Partitioning and Graph Clustering.

[10]  Jon M. Kleinberg,et al.  An Impossibility Theorem for Clustering , 2002, NIPS.

[11]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[12]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[13]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008, Proceedings of the Python in Science Conference.

[14]  Jure Leskovec,et al.  Local Higher-Order Graph Clustering , 2017, KDD.

[15]  Angelo Bifone,et al.  Community detection in weighted brain connectivity networks beyond the resolution limit , 2016, NeuroImage.

[16]  Alexander Y. Shestopaloff,et al.  A Statistical Performance Analysis of Graph Clustering Algorithms , 2018, WAW.

[17]  Andrew B. Nobel,et al.  Significance Testing in Clustering , 2015 .

[18]  Andrea Lancichinetti,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[19]  Boleslaw K. Szymanski,et al.  Extension of Modularity Density for overlapping community structure , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[20]  Liudmila Prokhorenkova,et al.  Using synthetic networks for parameter tuning in community detection , 2019, WAW.

[21]  Pierre Hansen,et al.  Modularity maximization in networks by variable neighborhood search , 2011, Graph Partitioning and Graph Clustering.

[22]  Mohammed J. Zaki,et al.  Is There a Best Quality Metric for Graph Clusters? , 2011, ECML/PKDD.

[23]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[24]  Ulrik Brandes,et al.  On Modularity Clustering , 2008, IEEE Transactions on Knowledge and Data Engineering.

[25]  S. Bornholdt,et al.  When are networks truly modular , 2006, cond-mat/0606220.

[26]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[27]  Panos M. Pardalos,et al.  Robust Optimization of Graph Partitioning and Critical Node Detection in Analyzing Networks , 2010, COCOA.

[28]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[29]  Liudmila Ostroumova,et al.  Modularity of Complex Networks Models , 2016, WAW.

[30]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[31]  Liudmila Ostroumova,et al.  Modularity in several random graph models , 2017, Electron. Notes Discret. Math..

[32]  Emden R. Gansner,et al.  Using automatic clustering to produce high-level system organizations of source code , 1998, Proceedings. 6th International Workshop on Program Comprehension. IWPC'98 (Cat. No.98TB100242).

[33]  Elena Marchiori,et al.  Axioms for graph clustering quality functions , 2013, J. Mach. Learn. Res..

[34]  Athanasios Kehagias,et al.  Bad communities with high modularity , 2012, The European Physical Journal B.

[35]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[36]  Hristo Djidjev,et al.  Using graph partitioning for efficient network modularity optimization , 2012, Graph Partitioning and Graph Clustering.

[37]  Sylvain Peyronnet,et al.  On the Evaluation Potential of Quality Functions in Community Detection for Different Contexts , 2015, NetSci-X.

[38]  Lawrence B. Holder,et al.  Current and Future Challenges in Mining Large Networks: Report on the Second SDM Workshop on Mining Networks and Graphs , 2016, SIGKDD Explor..

[39]  Vincent A. Traag,et al.  Significant Scales in Community Structure , 2013, Scientific Reports.

[40]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[41]  Panos M. Pardalos,et al.  Linear and quadratic programming approaches for the general graph partitioning problem , 2010, J. Glob. Optim..

[42]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[43]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[44]  Shai Ben-David,et al.  Measures of Clustering Quality: A Working Set of Axioms for Clustering , 2008, NIPS.

[45]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[46]  Shang-Hua Teng,et al.  A Local Clustering Algorithm for Massive Graphs and Its Application to Nearly Linear Time Graph Partitioning , 2008, SIAM J. Comput..

[47]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[48]  Stephen G. Kobourov,et al.  Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale , 2016, PloS one.

[49]  Benjamin H. Good,et al.  Performance of modularity maximization in practical contexts. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[50]  P Van Mieghem,et al.  Spectral graph analysis of modularity and assortativity. , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[51]  Moni Naor,et al.  Algorithms and Models for the Web Graph , 2016, Lecture Notes in Computer Science.

[52]  Santo Fortunato,et al.  Community detection in networks: A user guide , 2016, ArXiv.

[53]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[54]  Tam'as Vicsek,et al.  Modularity measure of networks with overlapping communities , 2009, 0910.5072.

[55]  Leonidas S. Pitsoulis,et al.  Community detection by modularity maximization using GRASP with path relinking , 2013, Comput. Oper. Res..

[56]  Kishan G. Mehrotra,et al.  Game-Theoretic Framework for Community Detection , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..