Analyzing MPI-3.0 Process-Level Shared Memory: A Case Study with Stencil Computations

The recently released MPI-3.0 standard introduced a process-level shared-memory interface which enables processes within the same node to have direct load/store access to each others' memory. Such an interface allows applications to declare data structures that are shared by multiple MPI processes on the node. In this paper, we study the capabilities and performance implications of using MPI-3.0 shared memory, in the context of a five-point stencil computation. Our analysis reveals that the use of MPI-3.0 shared memory has several unforeseen performance implications including disrupting certain compiler optimizations and incorrectly using suboptimal page sizes inside the OS. Based on this analysis, we propose several methodologies for working around these issues and improving communication performance by 40-85% compared to the current MPI-1.0 based approach.

[1]  Sayantan Sur,et al.  LiMIC: support for high-performance MPI intra-node communication on Linux cluster , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[2]  Franz Franchetti,et al.  Data Layout Transformation for Stencil Computations on Short-Vector SIMD Architectures , 2011, CC.

[3]  Torsten Hoefler,et al.  Hybrid MPI: Efficient message passing for multi-core systems , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[4]  Liang Li,et al.  mPlogP: A Parallel Computation Model for Heterogeneous Multi-core Computer , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[5]  Torsten Hoefler,et al.  Leveraging MPI's One-Sided Communication Interface for Shared-Memory Programming , 2012, EuroMPI.

[6]  Torsten Hoefler,et al.  Ownership passing: efficient distributed memory programming on multi-core systems , 2013, PPoPP '13.

[7]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[8]  Julien Jaeger,et al.  Automatic efficient data layout for multithreaded stencil codes on CPU sand GPUs , 2012, 2012 19th International Conference on High Performance Computing.

[9]  Brice Goglin,et al.  KNEM: A generic and scalable kernel-assisted intra-node MPI communication framework , 2013, J. Parallel Distributed Comput..

[10]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[11]  Torsten Hoefler,et al.  MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory , 2013, Computing.

[12]  Dhabaleswar K. Panda,et al.  Efficient one-copy MPI shared memory communication in Virtual Machines , 2008, 2008 IEEE International Conference on Cluster Computing.