Dynamic load balancing on heterogeneous multi-GPU systems

Actual HPC systems are composed by multicore processors and powerful graphics processing units. Adapting existing code and libraries to these new systems is a fundamental problem due to the important increment on programming difficulties. The heterogeneity, both at architectural and programming levels at the same time, raises the programmability wall. The performance of the code is affected by the large interdependence between the code and the parallel architecture. We have developed a dynamic load balancing library that allows parallel code to be adapted to a wide variety of heterogeneous systems. The overhead introduced by our system is minimal and the cost to the programmer negligible. This system has been successfully applied to solve load imbalance problems appearing in homogeneous and heterogeneous multiGPU platforms. We consider the Dynamic Programming technique as case of study to validate our proposals using different heterogeneous scenarios in multiGPU systems.

[1]  Francisco Almeida,et al.  Dynamic Load Balancing on Dedicated Heterogeneous Systems , 2008 .

[2]  Kevin Skadron,et al.  A performance study of general-purpose applications on graphics processors using CUDA , 2008, J. Parallel Distributed Comput..

[3]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[4]  Warren B. Powell,et al.  Dynamic Programming Models and Algorithms for the Mutual Fund Cash Balance Problem , 2010, Manag. Sci..

[5]  Antonio J. Plaza,et al.  Automatic tuning of iterative computation on heterogeneous multiprocessors with ADITHE , 2011, The Journal of Supercomputing.

[6]  T. Ibaraki Enumerative approaches to combinatorial optimization - part I , 1988 .

[7]  Victor Eijkhout,et al.  Self-adapting numerical software (SANS) effort , 2006, IBM J. Res. Dev..

[8]  M. Held,et al.  Finite-State Processes and Dynamic Programming , 1967 .

[9]  Eduard Ayguadé,et al.  An Extension of the StarSs Programming Model for Platforms with Multiple GPUs , 2009, Euro-Par.

[10]  Robert A. van de Geijn,et al.  SuperMatrix: a multithreaded runtime scheduling system for algorithms-by-blocks , 2008, PPoPP.

[11]  Yves Robert,et al.  Heterogeneous computing , 2005, Parallel Comput..

[12]  Didier El Baz,et al.  Dense Dynamic Programming on Multi GPU , 2011, 2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing.

[13]  Robert Giegerich,et al.  GPU Parallelization of Algebraic Dynamic Programming , 2009, PPAM.

[14]  Javier Cuenca,et al.  Heuristics for work distribution of a homogeneous parallel dynamic programming scheme on heterogeneous systems , 2004, Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks.

[15]  Sam S. Stone,et al.  MCUDA: An Efficient Implementation of CUDA Kernels on Multi-cores , 2011 .

[16]  Wu-chun Feng,et al.  On the Robust Mapping of Dynamic Programming onto a Graphics Processing Unit , 2009, 2009 15th International Conference on Parallel and Distributed Systems.

[17]  Jack Dongarra,et al.  Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .

[18]  Huseyin Topaloglu,et al.  A Dynamic Programming Decomposition Method for Making Overbooking Decisions Over an Airline Network , 2010, INFORMS J. Comput..

[19]  Paul Helman,et al.  A common schema for dynamic programming and branch and bound algorithms , 1989, JACM.