Implementation and Shared-Memory Evaluation of MPICH2 over the Nemesis Communication Subsystem

This paper presents the implementation of MPICH2 over the Nemesis communication subsystem and the evaluation of its shared-memory performance. We describe design issues as well as some of the optimization techniques we employed. We conducted a performance evaluation over shared memory using microbenchmarks as well as application benchmarks. The evaluation shows that MPICH2 Nemesis has very low communication overhead, making it suitable for smaller-grained applications.

[1]  Remzi H. Arpaci-Dusseau,et al.  Architectural Requirements and Scalability of the NAS Parallel Benchmarks , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[2]  David H. Bailey,et al.  NAS parallel benchmark results , 1992, Proceedings Supercomputing '92.

[3]  Guillaume Mercier,et al.  Design and evaluation of Nemesis, a scalable, low-latency, message-passing communication subsystem , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[4]  Guillaume Mercier,et al.  Data Transfers between Processes in an SMP System: Performance Study and Application to MPI , 2006, 2006 International Conference on Parallel Processing (ICPP'06).

[5]  Greg Burns,et al.  LAM: An Open Cluster Environment for MPI , 2002 .

[6]  George Bosilca,et al.  Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.

[7]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[8]  George Ho,et al.  PAPI: A Portable Interface to Hardware Performance Counters , 1999 .