CPU+GPU Load Balance Guided by Execution Time Prediction

We contribute a method to jointly use CPU and GPU in order to execute a balanced parallel code, automatically generated using polyhedral tools. To evenly distribute the load, the system is guided by predictions of loop nest execution times. Static and dynamic performance factors are modelled by two automatic and portable frameworks targeting CPUs and CUDA GPUs. The prediction methods comprise three parts: static code generation, offline profiling and online prediction. There are multiple versions of the loop nests, so that our scheduler balances the load of multiple combinations of code versions and selects the fastest before execution. This proposal is validated on the polyhedral benchmark suite, showing that CPU+GPU load balance is maintained and overhead is minimal.

[1]  Jingling Xue,et al.  Model-Driven Tile Size Selection for DOACROSS Loops on GPUs , 2011, Euro-Par.

[2]  Kevin Skadron,et al.  Load balancing in a changing world: dealing with heterogeneity and performance variability , 2013, CF '13.

[3]  Sven Verdoolaege,et al.  isl: An Integer Set Library for the Polyhedral Model , 2010, ICMS.

[4]  Vincent Loechner,et al.  Adaptive Runtime Selection for GPU , 2013, 2013 42nd International Conference on Parallel Processing.

[5]  Francky Catthoor,et al.  Polyhedral parallel code generation for CUDA , 2013, TACO.

[6]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[7]  Laxmi N. Bhuyan,et al.  A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures , 2013, TACO.

[8]  Vincent Loechner,et al.  Adaptive runtime selection of parallel schedules in the polytope model , 2011, SpringSim.

[9]  Uday Bondhugula,et al.  A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.

[10]  Eduard Ayguadé,et al.  Self-Adaptive OmpSs Tasks in Heterogeneous Environments , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[11]  Scott A. Mahlke,et al.  Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[12]  Cédric Augonnet,et al.  Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures , 2009, Euro-Par Workshops.

[13]  Hiroshi Nakamura,et al.  Integrating Multi-GPU Execution in an OpenACC Compiler , 2013, 2013 42nd International Conference on Parallel Processing.

[14]  R. Dolbeau,et al.  HMPP TM : A Hybrid Multi-core Parallel Programming Environment , 2022 .

[15]  Sven Verdoolaege,et al.  Polyhedral Extraction Tool , 2012 .

[16]  Vincent Loechner,et al.  Counting Integer Points in Parametric Polytopes Using Barvinok's Rational Functions , 2007, Algorithmica.