Exploiting multi-core processors for scientific applications using hybrid MPI-OpenMP

Most current and emerging high-performance systems consist of large numbers of processors set within an architecture with ‘fat’ shared memory nodes supporting tens of threads per node. There are good reasons to adopt a hybrid MPI-OpenMP programming model for large-scale applications on such architectures, but this adds complexity to the parallel program and demands scalability at two levels: MPI across nodes and OpenMP within a node. We present performance and scaling studies for four applications (Fluidity-ICOM, NEMO, PRMAT and a 3D Red-Black Smoother) that use the hybrid MPI-OpenMP programming model. We show that for computations that use a large number of cores the hybrid approach provides a significant improvement to the performance provided that algorithms with minimal synchronisation and suitable libraries are used.