An study of the effect of process malleability in the energy efficiency on GPU-based clusters

The adoption of graphic processor units (GPU) in high-performance computing (HPC) infrastructures determines, in many cases, the energy consumption of those facilities. For this reason, an efficient management and administration of the GPU-enabled clusters is crucial for the optimum operation of the cluster. The main aim of this work is to study and design efficient mechanisms of job scheduling across GPU-enabled clusters by leveraging process malleability techniques, able to reconfigure running jobs, depending on the cluster status. This paper presents a model that improves the energy efficiency when processing a batch of jobs in an HPC cluster. The model is validated through the MPDATA algorithm, as a representative example of stencil computation used in numerical weather prediction. The proposed solution applies the efficiency metrics obtained in a new reconfiguration policy aimed at job arrays. This solution allows the reduction in the processing time of workloads up to 4.8 times and reduction in the energy consumption up to 2.4 times the cluster compared to the traditional job management, where jobs are not reconfigured during their execution.

[1]  Laxmikant V. Kalé,et al.  Towards realizing the potential of malleable jobs , 2014, 2014 21st International Conference on High Performance Computing (HiPC).

[2]  Krzysztof Rojek,et al.  Machine learning method for energy reduction by utilizing dynamic mixed precision on GPU‐based supercomputers , 2019, Concurr. Comput. Pract. Exp..

[3]  Dror G. Feitelson,et al.  The workload on parallel supercomputers: modeling the characteristics of rigid jobs , 2003, J. Parallel Distributed Comput..

[4]  Piotr K. Smolarkiewicz,et al.  Multidimensional positive definite advection transport algorithm: an overview , 2006 .

[5]  Gerassimos Barlas,et al.  Multicore and GPU Programming: An Integrated Approach , 2014 .

[6]  Srikumar Venugopal,et al.  Architecting Malleable MPI Applications for Priority-driven Adaptive Scheduling , 2016, EuroMPI.

[7]  Enrique S. Quintana-Ortí,et al.  Modeling power consumption of 3D MPDATA and the CG method on ARM and Intel multicore architectures , 2017, The Journal of Supercomputing.

[8]  Boleslaw K. Szymanski,et al.  An Architecture for Reconfigurable Iterative MPI Applications in Dynamic Environments , 2005, PPAM.

[9]  Rajesh Sudarsan,et al.  Scheduling resizable parallel applications , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[10]  Roman Wyrzykowski,et al.  Performance modeling of 3D MPDATA simulations on GPU cluster , 2016, The Journal of Supercomputing.

[11]  Sergio Iserte,et al.  DMR API: Improving cluster productivity by turning applications into malleable , 2018, Parallel Comput..

[12]  Sergio Iserte,et al.  Efficient Scalable Computing through Flexible Applications and Adaptive Workloads , 2017, 2017 46th International Conference on Parallel Processing Workshops (ICPPW).

[13]  Boleslaw K. Szymanski,et al.  Malleable iterative MPI applications , 2009, Concurr. Comput. Pract. Exp..

[14]  Andy B. Yoo,et al.  Approved for Public Release; Further Dissemination Unlimited X-ray Pulse Compression Using Strained Crystals X-ray Pulse Compression Using Strained Crystals , 2002 .

[15]  Hans-Joachim Bungartz,et al.  Infrastructure and API Extensions for Elastic Execution of MPI Applications , 2016, EuroMPI.

[16]  Jesús Labarta,et al.  Collective Offload for Heterogeneous Clusters , 2015, 2015 IEEE 22nd International Conference on High Performance Computing (HiPC).

[17]  Dror G. Feitelson,et al.  Packing Schemes for Gang Scheduling , 1996, JSSPP.

[18]  Sergio Iserte,et al.  Dynamic reconfiguration of noniterative scientific applications: A case study with HPG aligner , 2019, Int. J. High Perform. Comput. Appl..

[19]  Lukasz Szustak,et al.  Strategy for data-flow synchronizations in stencil parallel computations on multi-/manycore systems , 2018, The Journal of Supercomputing.

[20]  Laxmikant V. Kalé,et al.  A Batch System with Efficient Adaptive Scheduling for Malleable and Evolving Applications , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[21]  Roman Wyrzykowski,et al.  Systematic adaptation of stencil‐based 3D MPDATA to GPU architectures , 2017, Concurr. Comput. Pract. Exp..

[22]  Martin Burtscher,et al.  Measuring GPU Power with the K20 Built-in Sensor , 2014, GPGPU@ASPLOS.

[23]  Johannes M. Dieterich,et al.  Malleable parallelism with minimal effort for maximal throughput and maximal hardware load , 2019, Computational and Theoretical Chemistry.

[24]  J. Prusa,et al.  EULAG, a computational model for multiscale flows , 2008 .

[25]  Jesús Carretero,et al.  Enhancing the performance of malleable MPI applications by using performance-aware dynamic reconfiguration , 2015, Parallel Comput..