Betweenness Centrality in an HSA-enabled System

This paper studies different approaches to implementing betweenness centrality in a heterogeneous system. Betweenness centrality is an important algorithm in graph processing. It presents multiple levels of parallelism when processing a graph, and is an interesting problem to exploit various optimizations. We implement different versions of betweenness centrality on an AMD accelerated processing unit (APU). These include GPU-only implementations with two edge distribution methods, GPU-side load balancing, CPU-GPU load balancing in a master-worker model with queue monitoring and in a work stealing model. We take advantage of the latest development of heterogeneous system architecture (HSA), such as the features of unified virtual address space and diverse atomics. We also use different memory scope and ordering options for different synchronization scenarios. We compare multiple implementations of betweenness centrality, analyze their performance, and discuss important future research directions.