Enhancing Performance Portability of MPI Applications through Annotation-Based Transformations

MPI is the de facto standard for portable parallel programming on high-end systems. However, while the MPI standard provides functional portability, it does not provide sufficient performance portability across platforms. We present a framework that enables users to provide hints about communication patterns used within MPI applications. These annotations are then used by an automated program transformation system to leverage different MPI operations that better match each system's capabilities. Our framework currently supports three automated transformations: coalescing of operations in MPI one-sided communications, transformation of blocking communications to nonblocking, which enables communication-computation overlap, and selection of the appropriate communication operators based on the cache-coherence support of the underlying platform. We use our annotation-based approach to optimize several benchmark kernels, and we demonstrate that the framework is effective at automatically improving performance portability for MPI applications.

[1]  Qing Yi,et al.  POET: a scripting language for applying parameterized source‐to‐source program transformations , 2012, Softw. Pract. Exp..

[2]  Zhiwei Xu,et al.  Modeling communication overhead: MPI and MPL performance on the IBM SP2 , 1996, IEEE Parallel Distributed Technol. Syst. Appl..

[3]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[4]  Rajeev Thakur,et al.  An Evaluation of Implementation Options for MPI One-Sided Communication , 2005, PVM/MPI.

[5]  Philip Heidelberger,et al.  Optimization of MPI collective communication on BlueGene/L systems , 2005, ICS '05.

[6]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[7]  W. Marsden I and J , 2012 .

[8]  D. Martin Swany,et al.  Transformations to Parallel Codes for Communication-Computation Overlap , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[9]  D. Martin Swany,et al.  MPI-aware compiler optimizations for improving communication-computation overlap , 2009, ICS.

[10]  Martin Schulz,et al.  Detecting Patterns in MPI Communication Traces , 2008, 2008 37th International Conference on Parallel Processing.

[11]  Sayantan Sur,et al.  RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits , 2006, PPoPP '06.

[12]  Martin Schulz,et al.  Transforming MPI source code based on communication patterns , 2010, Future Gener. Comput. Syst..

[13]  Xin Yuan,et al.  Automatic generation and tuning of MPI collective communication routines , 2005, ICS '05.

[14]  M. AdelsonVelskii,et al.  AN ALGORITHM FOR THE ORGANIZATION OF INFORMATION , 1963 .