Scalable stochastic block partition

The processing of graph data at large scale, though important and useful for real-world applications, continues to be challenging, particularly for problems such as graph partitioning. Optimal graph partitioning is NP-hard, but several methods provide approximate solutions in reasonable time. Yet scaling these approximate algorithms is also challenging. In this paper, we describe our efforts towards improving the scalability of one such technique, stochastic block partition, which is the baseline algorithm for the IEEE HPEC Graph Challenge [1]. Our key contributions are: improvements to the parallelization of the baseline bottom-up algorithm, especially the Markov Chain Monte Carlo (MCMC) nodal updates for Bayesian inference; a new top-down divide and conquer algorithm capable of reducing the algorithmic complexity of static partitioning and also suitable for streaming partitioning; a parallel single-node multi-CPU implementation and a parallel multi-node MPI implementation. Although our focus is on algorithmic scalability, our Python implementation obtains a speedup of 1.65× over the fastest baseline parallel C++ run at a graph size of 100k vertices divided into 8 subgraphs on a multi-CPU single node machine. It achieves a speedup of 61× over itself on a cluster of 4 machines with 256 CPUs for a 20k node graph divided into 4 subgraphs, and 441× speedup over itself on a 50k node graph divided into 8 subgraphs on a multi-CPU single node machine.

[1]  Tiago P. Peixoto Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2]  William Song,et al.  Streaming graph challenge: Stochastic block partition , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[3]  Joseph JáJá,et al.  A High Performance Implementation of Spectral Clustering on CPU-GPU Platforms , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[4]  Tiago P. Peixoto The entropy of stochastic blockmodel ensembles , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[6]  Mario A. Storti,et al.  MPI for Python: Performance improvements and MPI-2 extensions , 2008, J. Parallel Distributed Comput..

[7]  Suzumura Toyotaro,et al.  An incremental local-first community detection method for dynamic graphs , 2016 .

[8]  J. Kiefer,et al.  Sequential minimax search for a maximum , 1953 .

[9]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[10]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[11]  Tiago P Peixoto,et al.  Parsimonious module inference in large networks. , 2012, Physical review letters.