CloudCL: Distributed Heterogeneous Computing on Cloud Scale

The ever-growing demand for computing resources has reached a wide range of application domains. Even though the ubiquitous availability of cloud-based GPU instances provides an abundance of computing resources, the programmatic complexity of utilizing heterogeneous hardware in a scale-out scenario is not yet addressed sufficiently. We deal with this issue by introducing the CloudCL framework, which enables developers to focus their implementation efforts on compute kernels without having to consider inter-node communication. Using CloudCL, developers can access the resources of an entire cluster as if they were local resources. The framework also facilitates the development of cloud-native application behavior by supporting dynamic addition and removal of resources at runtime. The combination of a straightforward job design and the corresponding job scheduling framework make sure that cluster resources are used efficiently and fairly. In an extensive performance evaluation, we demonstrate that the framework provides close-to-linear scale-out capabilities in multi-node deployment scenarios.

[1]  Pei Li,et al.  Automatic OpenCL Code Generation for Multi-device Heterogeneous Architectures , 2015, 2015 44th International Conference on Parallel Processing.

[2]  Sergio Iserte,et al.  On the benefits of the remote GPU virtualization mechanism: The rCUDA case , 2017, Concurr. Comput. Pract. Exp..

[3]  Anna Goldenberg,et al.  TensorFlow: Biology's Gateway to Deep Learning? , 2016, Cell systems.

[4]  Natalie D. Enright Jerger,et al.  DistCL: A Framework for the Distributed Execution of OpenCL Kernels , 2013, 2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems.

[5]  Sebastian Nanz,et al.  Examining the Expert Gap in Parallel Programming , 2013, Euro-Par.

[6]  Jungwon Kim,et al.  Achieving a single compute device image in OpenCL for multiple GPUs , 2011, PPoPP '11.

[7]  Wu-chun Feng,et al.  Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL , 2015, 2015 IEEE International Conference on Cluster Computing.

[8]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[9]  Andreas Polze,et al.  dOpenCL – Evaluation of an API-Forwarding Implementation , 2016 .

[10]  Pablo Toharia,et al.  Static Multi-device Load Balancing for OpenCL , 2012, 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications.

[11]  Sergei Gorlatch,et al.  dOpenCL: Towards a Uniform Programming Approach for Distributed Heterogeneous Multi-/Many-Core Systems , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[12]  Jungwon Kim,et al.  SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters , 2012, ICS '12.