Globalizing selectively: Shared-memory efficiency with address-space separation

It has become common for MPI-based applications to run on shared-memory machines. However, MPI semantics do not allow leveraging shared memory fully for communication between processes from within the MPI library. This paper presents an approach that combines compiler transformations with a specialized runtime system to achieve zero-copy communication whenever possible by proving certain properties statically and globalizing data selectively by altering the allocation and deallocation of communication buffers. The runtime system provides dynamic optimization, when such proofs are not possible statically, by copying data only when there are write-write or read-write conflicts. We implemented a prototype compiler, using ROSE, and evaluated it on several benchmarks. Our system produces code that performs better than MPI in most cases and no worse than MPI, tuned for shared memory, in all cases.

[1]  Torsten Hoefler,et al.  Ownership passing: efficient distributed memory programming on multi-core systems , 2013, PPoPP '13.

[2]  Martin Odersky,et al.  Capabilities for Uniqueness and Borrowing , 2010, ECOOP.

[3]  C. A. R. Hoare,et al.  Communicating sequential processes , 1978, CACM.

[4]  Martin Schulz,et al.  Formal analysis of MPI-based parallel programs , 2011, Commun. ACM.

[5]  Ken Kennedy,et al.  Resource-Based Communication Placement Analysis , 1996, LCPC.

[6]  Andrew Lumsdaine,et al.  Partial globalization of partitioned address spaces for zero-copy communication with shared memory , 2011, 2011 18th International Conference on High Performance Computing.

[7]  Guillaume Mercier,et al.  Implementation and Shared-Memory Evaluation of MPICH2 over the Nemesis Communication Subsystem , 2006, PVM/MPI.

[8]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[9]  Greg Bronevetsky,et al.  Communication-Sensitive Static Dataflow for Parallel Message Passing Applications , 2009, 2009 International Symposium on Code Generation and Optimization.

[10]  Abraham Silberschatz,et al.  Operating System Concepts , 1983 .

[11]  Tao Yang,et al.  Program transformation and runtime support for threaded MPI execution on shared-memory machines , 2000, TOPL.

[12]  D. Martin Swany,et al.  MPI-aware compiler optimizations for improving communication-computation overlap , 2009, ICS.

[13]  Stas Negara,et al.  Inferring ownership transfer for efficient message passing , 2011, PPoPP '11.

[14]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .