Parallel Louvain Community Detection Algorithm Based on Dynamic Thread Assignment on Graphic Processing Unit

Background and Objectives: Louvain is a time-consuming community detection algorithm especially in large-scale networks. Using Graphic Processing Unit (GPU) in order to calculate modularity sigma, which is a major processing section in Louvain algorithm, can reduce algorithm execution time and make it practical for large-scale networks. Methods: The proposed algorithm Dynamic CUDA Louvain Method (DCLM) blocks hardware threads dynamically on cores inside GPU. By considering the properties of GPU, this algorithm allocates the maximal number of processing cores to each Stream Multi-Processor (SM) as number of threads in a block.  If the number of nodes in the graph is smaller than all physical cores on GPU, number of threads per block Is equal to the ratio number of graph nodes over the number of SMs. Results: The implementation results demonstrated that the proposed algorithm is able to decrease the run time by 15% in comparison with the best past method in the large-scale graph. Conclusion: We have introduced DCLM algorithm based on GPU that accelerates Louvain community detection algorithm. Dynamic allocation of threads to each block has a significant effect on the reduction of algorithm execution time. However, incrementing the number of threads per block alone does not result to acceleration the speed of calculations.  ======================================================================================================Copyrights©2021 The author(s). This is an open access article distributed under the terms of the Creative Commons Attribution (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, as long as the original authors and source are cited. No permission is required from the authors or the publishers.======================================================================================================