An Efficient Load Balancing Method for Tree Algorithms

Nowadays, multiprocessing is mainstream with exponentially increasing number of processors. Load balancing is, therefore, a critical operation for the efficient execution of parallel algorithms. In this paper we consider the fundamental class of tree-based algorithms that are notoriously irregular,, hard to load-balance with existing static techniques. We propose a hybrid load balancing method using the utility of statistical random sampling in estimating the tree depth, node count distributions to uniformly partition an input tree. To conduct an initial performance study, we implemented the method on an Intel Xeon Phi accelerator system. We considered the tree traversal operation on both regular, irregular unbalanced trees manifested by Fibonacci, unbalanced (biased) randomly generated trees, respectively. The results show scalable performance for up to the 60 physical processors of the accelerator, as well as an extrapolated 128 processors case.

[1]  G. Blake,et al.  A survey of multicore processors , 2009, IEEE Signal Processing Magazine.

[2]  Daniel Rudoy,et al.  Rare Event Simulation and Counting Problems , 2009, Rare Event Simulation using Monte Carlo Methods.

[3]  D. Knuth Estimating the efficiency of backtrack programs. , 1974 .

[4]  Zbigniew J. Czech,et al.  Introduction to Parallel Computing , 2017 .

[5]  Stephen L. Olivier,et al.  UTS: An Unbalanced Tree Search Benchmark , 2006, LCPC.

[6]  Francis C. M. Lau,et al.  Load balancing in parallel computers - theory and practice , 1996, The Kluwer international series in engineering and computer science.

[7]  Dirk P. Kroese,et al.  Stochastic Enumeration Method for Counting Trees , 2017 .

[8]  Donald E. Knuth,et al.  The art of computer programming: V.1.: Fundamental algorithms , 1997 .

[9]  Donald E. Knuth,et al.  The Art of Computer Programming, Volume I: Fundamental Algorithms, 2nd Edition , 1997 .

[10]  Katherine Yelick,et al.  Randomized load balancing for tree-structured computation , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[11]  Attila Gürsoy,et al.  Data Decomposition for Parallel K-means Clustering , 2003, Parallel Processing and Applied Mathematics.