A BSP-Based Parallel Iterative Processing System with Multiple Partition Strategies for Big Graphs

Many applications in real life can produce a large amount of data which can be modeled by a graph. A large graph usually has millions of vertices and billions of edges. This paper presents a BSP-based system, called BC-BSP+, to process large graphs iteratively in parallel. It has the flexibility to configure policies (i.e., disk management parameters) and extend functions (i.e., programming interfaces), to compute large-scale graphs, to tolerate faults, and to balance loads. Especially, three graph partition strategies in BC-BSP+ are proposed to support large graph processing: Randomized Hash Partition (RHP), Balanced Hash Partition (BHP) and Vertex-Cut based on the Range Partition method (VCRP). Lots of experiments are conducted to evaluate BC-BSP+. The experimental results show that the performance of VCRP is better than that of BHP, but the latter is more general. We compare BC-BSP+ with Hadoop, a system based on MapReduce, and the speedup is roughly 8. Moreover, compared with the BSP-based systems, Hama and Giraph, the speedup is also 2 to 6 benefitting from VCRP.