Comparison and Benchmark of Graph Clustering Algorithms

Graph clustering is widely used in analysis of biological networks, social networks and etc. For over a decade many graph clustering algorithms have been published, however a comprehensive and consistent performance comparison is not available. In this paper we benchmarked more than 70 graph clustering programs to evaluate their runtime and quality performance for both weighted and unweighted graphs. We also analyzed the characteristics of ground truth that affects the performance. Our work is capable to not only supply a start point for engineers to select clustering algorithms but also could provide a viewpoint for researchers to design new algorithms.

[1]  Steve Gregory,et al.  Finding overlapping communities in networks by label propagation , 2009, ArXiv.

[2]  Steve Gregory,et al.  An Algorithm to Find Overlapping Community Structure in Networks , 2007, PKDD.

[3]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  S. Dongen Graph clustering by flow simulation , 2000 .

[5]  Martin Rosvall,et al.  Multilevel Compression of Random Walks on Networks Reveals Hierarchical Organization in Large Integrated Systems , 2010, PloS one.

[6]  J. Reichardt,et al.  Statistical mechanics of community detection. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[8]  Ben Strasser,et al.  Distributed Graph Clustering Using Modularity and Map Equation , 2017, Euro-Par.

[9]  Jianbo Shi,et al.  Multiclass spectral clustering , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  Dino Pedreschi,et al.  Uncovering Hierarchical and Overlapping Communities with a Local-First Approach , 2014, TKDD.

[11]  Andrea Lancichinetti,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[12]  J. Kumpula,et al.  Sequential algorithm for fast clique percolation. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  Steve Gregory,et al.  Detecting communities in networks by merging cliques , 2009, 2009 IEEE International Conference on Intelligent Computing and Intelligent Systems.

[14]  Daniel Halperin,et al.  Scalable Flow-Based Community Detection for Large-Scale Network Analysis , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.

[15]  M. Gribaudo,et al.  2002 , 2001, Cell and Tissue Research.

[16]  Alexandre Hollocou,et al.  Hierarchical Graph Clustering using Node Pair Sampling , 2018, ArXiv.

[17]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[18]  W. Wilfred Godfrey,et al.  Comparative analysis of community detection algorithms , 2017, 2017 Conference on Information and Communication Technology (CICT).

[19]  Matthieu Latapy,et al.  Computing Communities in Large Networks Using Random Walks , 2004, J. Graph Algorithms Appl..

[20]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[21]  Chang-Dong Wang,et al.  EdMot: An Edge Enhancement Approach for Motif-aware Community Detection , 2019, KDD.

[22]  Andrea Lancichinetti,et al.  Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  A. Azzouz 2011 , 2020, City.

[24]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[25]  Fergal Reid,et al.  Percolation Computation in Complex Networks , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[26]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[27]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[28]  Chris Hankin,et al.  Fast Multi-Scale Detection of Relevant Communities , 2012, ArXiv.

[29]  E. Kandel,et al.  Proceedings of the National Academy of Sciences of the United States of America. Annual subject and author indexes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Robert E. Tarjan,et al.  Graph Clustering and Minimum Cut Trees , 2004, Internet Math..

[31]  David M Blei,et al.  Efficient discovery of overlapping communities in massive networks , 2013, Proceedings of the National Academy of Sciences.

[32]  Xiaoming Liu,et al.  SLPA: Uncovering Overlapping Communities in Social Networks via a Speaker-Listener Interaction Dynamic Process , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[33]  Rok Sosic,et al.  SNAP , 2016, ACM Trans. Intell. Syst. Technol..

[34]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[35]  Xiandong Meng,et al.  SpaRC: Scalable Sequence Clustering using Apache Spark , 2018, bioRxiv.

[36]  Qiong Luo,et al.  Parallelizing Pruning-based Graph Structural Clustering , 2018, ICPP.

[37]  Stephen Kelley The existence and discovery of overlapping communities in large-scale networks , 2009 .

[38]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[39]  Alex Arenas,et al.  Analysis of the structure of complex networks at different resolution levels , 2007, physics/0703218.

[40]  Dino Pedreschi,et al.  DEMON: a local-first discovery method for overlapping communities , 2012, KDD.

[41]  Chris Hankin,et al.  Fast Multi-Scale Detection of Relevant Communities in Large-Scale Networks , 2013, Comput. J..

[42]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[43]  Zhao Yang,et al.  A Comparative Analysis of Community Detection Algorithms on Artificial Networks , 2016, Scientific Reports.

[44]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[45]  Yasuhiro Fujiwara,et al.  SCAN++: Efficient Algorithm for Finding Clusters, Hubs and Outliers on Large-scale Graphs , 2015, Proc. VLDB Endow..

[46]  Bill Howe,et al.  GossipMap: a distributed community detection algorithm for billion-edge directed graphs , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[47]  Ira Assent,et al.  Scalable and Interactive Graph Clustering Algorithm on Multicore CPUs , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[48]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[49]  Alexandre Hollocou,et al.  A Streaming Algorithm for Graph Clustering , 2017, NIPS 2017.

[50]  Fanghua Ye,et al.  Deep Autoencoder-like Nonnegative Matrix Factorization for Community Detection , 2018, CIKM.

[51]  Ambuj K. Singh,et al.  Scalable discovery of best clusters on large graphs , 2010, Proc. VLDB Endow..

[52]  Josep-Lluís Larriba-Pey,et al.  High quality, scalable and parallel community detection for large real graphs , 2014, WWW.

[53]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[54]  Andreas Geyer-Schulz,et al.  An ensemble learning strategy for graph clustering , 2012, Graph Partitioning and Graph Clustering.

[55]  Christian Staudt,et al.  NetworKit: A tool suite for large-scale complex network analysis , 2014, Network Science.

[56]  Jianbin Huang,et al.  Towards Online Multiresolution Community Detection in Large-Scale Networks , 2011, PloS one.

[57]  Neil J. Hurley,et al.  Detecting Highly Overlapping Communities with Model-Based Overlapping Seed Expansion , 2010, ASONAM.

[58]  P. Ronhovde,et al.  Local resolution-limit-free Potts model for community detection. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[59]  Lili Wang,et al.  Deconvolute individual genomes from metagenome sequences through short read clustering , 2020, PeerJ.

[60]  M. Cugmas,et al.  On comparing partitions , 2015 .

[61]  Xueqi Cheng,et al.  A Non-negative Symmetric Encoder-Decoder Approach for Community Detection , 2017, CIKM.

[62]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[63]  Boleslaw K. Szymanski,et al.  LabelRank: A stabilized label propagation algorithm for community detection in networks , 2013, 2013 IEEE 2nd Network Science Workshop (NSW).

[64]  V. Traag,et al.  Community detection in networks with positive and negative links. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[65]  Roger Guimerà,et al.  Extracting the hierarchical organization of complex systems , 2007, Proceedings of the National Academy of Sciences.

[66]  Santo Fortunato,et al.  Finding Statistically Significant Communities in Networks , 2010, PloS one.

[67]  Mao-Bin Hu,et al.  Detect overlapping and hierarchical community structure in networks , 2008, ArXiv.

[68]  Andrea Lancichinetti,et al.  Detecting the overlapping and hierarchical community structure in complex networks , 2008, 0802.1218.

[69]  Rik Sarkar,et al.  Karate Club: An API Oriented Open-Source Python Framework for Unsupervised Learning on Graphs , 2020, CIKM.

[70]  Elena Marchiori,et al.  Graph clustering with local search optimization: the resolution bias of the objective function matters most. , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[71]  Fergal Reid,et al.  Title Detecting Highly Overlapping Community Structure by Greedy Clique Expansion Detecting Highly Overlapping Community Structure by Greedy Clique Expansion , 2022 .

[72]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[73]  Lu Qin,et al.  pSCAN: Fast and exact structural graph clustering , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[74]  Chris H. Q. Ding,et al.  Symmetric Nonnegative Matrix Factorization for Graph Clustering , 2012, SDM.

[75]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[76]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[77]  Silvio Lattanzi,et al.  Ego-Splitting Framework: from Non-Overlapping to Overlapping Clusters , 2017, KDD.