论文信息 - Evaluation of Distributed Tasks in Stencil-based Application on GPUs

Evaluation of Distributed Tasks in Stencil-based Application on GPUs

In the era of exascale computing, the traditional MPI+X paradigm starts losing its strength in taking advantage of heterogeneous systems. Subsequently, research and development on finding alternative programming models and runtimes have become increasingly popular. This encourages comparison, on competitive grounds, of these emerging parallel programming approaches against the traditional MPI+X paradigm. In this work, an implementation of distributed task-based stencil numerical simulation is compared with a MPI+X implementation of the same application. To be more specific, the Legion task-based parallel programming system is used as an alternative to MPI at out-of-node level, while the underlying CUDA-implemented kernels are kept at node level. Therefore, the comparison is as fair as possible and focused on the distributed aspects of the simulation. Overall, the results show that the task-based approach is on par with the traditional MPI approach in terms of both performance and scalability.