Parallel Zero-Copy Algorithms for Fast Fourier Transform and Conjugate Gradient Using MPI Datatypes

Many parallel applications need to communicate noncontiguous data. Most applications manually copy (pack/unpack) data before communications even though MPI allows a zero-copy specification. In this work, we study two complex use-cases: (1) Fast Fourier Transformation where we express a local memory transpose as part of the datatype, and (2) a conjugate gradient solver with a checkerboard layout that requires multiple nested datatypes. We demonstrate significant speedups up to a factor of 3.8 and 18%, respectively, in both cases. Our work can be used as a template to utilize datatypes for application developers. For MPI implementers, we show two practically relevant access patterns that deserve special optimization.

[1]  Jack Dongarra,et al.  Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface , 1997 .

[2]  Hubert Ritzdorf,et al.  Flattening on the Fly: Efficient Handling of MPI Derived Datatypes , 1999, PVM/MPI.

[3]  Philip Heidelberger,et al.  The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer , 2008, ICS '08.

[4]  William H. Press,et al.  Numerical recipes in C (2nd ed.): the art of scientific computing , 1992 .

[5]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[6]  Wei Huang,et al.  Designing Zero-Copy Message Passing Interface Derived Datatype Communication Over Infiniband: Alternative Approaches and Performance Evaluation , 2005, Int. J. High Perform. Comput. Appl..

[7]  Surendra Byna,et al.  Improving the performance of MPI derived datatypes by optimizing memory-access cost , 2003, 2003 Proceedings IEEE International Conference on Cluster Computing.

[8]  Dhabaleswar K. Panda,et al.  Applying MPI derived datatypes to the NAS benchmarks: A case study , 2004, Workshops on Mobile and Wireless Networking/High Performance Scientific, Engineering Computing/Network Design and Architecture/Optical Networks Control and Management/Ad Hoc and Sensor Networks/Compil.

[9]  Michael M. Resch,et al.  Implementing and Benchmarking Derived Datatypes in Metacomputing , 2001, HPCN Europe.

[10]  William H. Press,et al.  Numerical recipes in C. The art of scientific computing , 1987 .

[11]  Ewing Lusk,et al.  Improving the Performance of MPI Derived Datatypes , 2010 .

[12]  A. Krasnitz,et al.  Studying Quarks and Gluons On Mimd Parallel Computers , 1991, Int. J. High Perform. Comput. Appl..