Clubmark: A Parallel Isolation Framework for Benchmarking and Profiling Clustering Algorithms on NUMA Architectures

There is a great diversity of clustering and community detection algorithms, which are key components of many data analysis and exploration systems. To the best of our knowledge, however, there does not exist yet any uniform benchmarking framework, which is publicly available and suitable for the parallel benchmarking of diverse clustering algorithms on a wide range of synthetic and real-world datasets. In this paper, we introduce Clubmark, a new extensible framework that aims to fill this gap by providing a parallel isolation benchmarking platform for clustering algorithms and their evaluation on NUMA servers. Clubmark allows for fine-grained control over various execution variables (timeouts, memory consumption, CPU affinity and cache policy) and supports the evaluation of a wide range of clustering algorithms including multi-level, hierarchical and overlapping clustering techniques on both weighted and unweighted input networks with built-in evaluation of several extrinsic and intrinsic measures. Our framework is open-source and provides a consistent and systematic way to execute, evaluate and profile clustering techniques considering a number of aspects that are often missing in state-of-the-art frameworks and benchmarking systems.

[1]  Martin Rosvall,et al.  Comparing network covers using mutual information , 2012, ArXiv.

[2]  Andrea Lancichinetti,et al.  Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  Steve Harenberg,et al.  Community detection in large‐scale networks: a survey and empirical evaluation , 2014 .

[4]  Eyke Hüllermeier,et al.  A Fuzzy Variant of the Rand Index for Comparing Clustering Structures , 2009, IFSA/EUSFLAT Conf..

[5]  Derek Greene,et al.  Normalized Mutual Information to evaluate overlapping community finding algorithms , 2011, ArXiv.

[6]  L. Collins,et al.  Omega: A General Formulation of the Rand Index of Cluster Recovery Suitable for Non-disjoint Solutions. , 1988, Multivariate behavioral research.

[7]  Tam'as Vicsek,et al.  Modularity measure of networks with overlapping communities , 2009, 0910.5072.

[8]  Meng Wang,et al.  Community Detection in Social Networks: An In-depth Benchmarking Study with a Procedure-Oriented Framework , 2015, Proc. VLDB Endow..

[9]  Andreas Geyer-Schulz,et al.  An ensemble learning strategy for graph clustering , 2012, Graph Partitioning and Graph Clustering.

[10]  Ranjan Maitra,et al.  CARP: Software for Fishing Out Good Clustering Algorithms , 2011, J. Mach. Learn. Res..

[11]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[12]  Xiaoming Liu,et al.  SLPA: Uncovering Overlapping Communities in Social Networks via a Speaker-Listener Interaction Dynamic Process , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[13]  J. Kumpula,et al.  Sequential algorithm for fast clique percolation. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  Shihua Zhang,et al.  Identification of overlapping community structure in complex networks using fuzzy c-means clustering , 2007 .

[15]  Leon Danon,et al.  Comparing community structure identification , 2005, cond-mat/0505245.

[16]  Santo Fortunato,et al.  Finding Statistically Significant Communities in Networks , 2010, PloS one.

[17]  Meng Wang,et al.  CoDAR: Revealing the Generalized Procedure & Recommending Algorithms of Community Detection , 2016, SIGMOD Conference.

[18]  Sisi Liu,et al.  Parallel Toolkit for Measuring the Quality of Network Community Structure , 2014, 2014 European Network Intelligence Conference.

[19]  M. Cugmas,et al.  On comparing partitions , 2015 .

[20]  V. Carchiolo,et al.  Extending the definition of modularity to directed graphs with overlapping communities , 2008, 0801.1647.

[21]  Josep-Lluís Larriba-Pey,et al.  High quality, scalable and parallel community detection for large real graphs , 2014, WWW.

[22]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[23]  Santosh S. Vempala,et al.  On clusterings: Good, bad and spectral , 2004, JACM.

[24]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[25]  Alexandru Iosup,et al.  LDBC Graphalytics: A Benchmark for Large-Scale Graph Analysis on Parallel and Distributed Platforms , 2016, Proc. VLDB Endow..

[26]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[28]  Ralf Klamma,et al.  WebOCD: a RESTful web-based overlapping community detection framework , 2015, I-KNOW.

[29]  Mountaz Hascoët,et al.  Evaluation of Clustering Algorithms: a methodology and a case study , 2014 .

[30]  Steve Gregory,et al.  Fuzzy overlapping communities in networks , 2010, ArXiv.

[31]  Lu Qin,et al.  pSCAN: Fast and exact structural graph clustering , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).