Running High Performance Linpack on CPUGPU clusters

A trend is developing in High-Performance Computing with cluster nodes built of general purpose CPUs and GPU accelerators. The common name of these systems is CPUGPU clusters. High Performance Linpack (HPL) benchmarking of High Performance Clusters consisting of nodes with both CPUs and GPUs is still a challenging task and deserves a high attention. In order to make HPL on such clusters more efficient, a multi-layered programming model consisting of at least Message Passing Interface (MPI), Multiprocessing (MP) and Streams Programming (Streams) needs to be utilized. Besides multi-layered programming model, it is crucial to deploy a right load-balancing scheme if someone wants to run HPL efficiently on CPUGPU systems. That means, besides the highest possible utilization rate, both fast and slow processors needs to receive appropriate portion of load, in order to avoid faster resources waiting on slower to finish their jobs. Moreover, in HPC clusters on Cloud, one has to take into account not only computing nodes of different processing power, but also a communication links of different speed between nodes as well. For this reasons we propose a load balancing method based on a semidefinite optimization. We hope that this method, coupled with a multi-layered programming, can perform a HPL benchmark on CPUGPU clusters and HPC Cloud systems more efficiently than methods used today.