A Parallel LFR-like Benchmark for Evaluating Community Detection Algorithms

Community detection is one of the most widely-used graph analytics. As recent community detection algorithms have been targeting large-scale networks, an emerging problem is how best to evaluate the output of these algorithms. Common measures such as modularity have several well-known issues, so comparisons against a notion of a “ground truth” community structure, such as in the Lancicinetti-Fortunato-Radicchi (LFR) benchmark, is preferred. This current work targets the parallel generation of graphs matching the specifications of the LFR benchmark. We are able to generate such graphs at the billion-edge scale in seconds, giving orders-of-magnitude speedup relative to prior work.