Performance Evaluation of OpenMP and MPI Hybrid Programs on a Large Scale Multi-core Multi-socket Cluster, T2K Open Supercomputer

Non-uniform memory access (NUMA) systems, where each processor has its own memory, have been popular platform in high-end computing. While some early studies had reported that a flat-MPI programming model outperformed an OpenMP/MPI hybrid programming model on SMP clusters, the hybrid of a shared-memory, thread-based programming and a distributed-memory, message passing programming is considered to be a promising programming model on the multi-core multi-socket NUMA clusters. We explore the performance of the OpenMP/MPI hybrid programming model on a large scale multi-core multi-socket cluster called T2K Open Supercomputer. Both of benchmark (NPB, NAS Parallel Benchmarks) and application (RSDFT, Real-Space Density Function Theory) codes are considered. The hybridization for the RSDFT code is also shown. Our experiments show that the multi-core multi-socket cluster can take advantage of the hybrid programming model when it uses MPI across sockets and OpenMP within sockets.