MPI is gaining acceptance as a standard for message-passing in high-performance computing, due to its powerful and flexible support of various communication styles. However, the complexity of its API poses significant software overhead, and as a result, applicability of MPI has been restricted to rather regular, coarse-grained computations. Our OMPI (Optimizing MPI) system removes much of the excess overhead by employing partial evaluation techniques, which exploit static information of MPI calls. Because partial evaluation alone is insufficient, we also utilize template functions for further optimization. To validate the effectiveness for our OMPI system, we performed baseline as well as more extensive benchmarks on a set of application cores with different communication characteristics, on the 64-node Fujitsu AP1000 MPP. Benchmarks show that OMPI improves execution efficiency by as much as factor of two for communication-intensive application core with minimal code increase. It also performs significantly better than previous dynamic optimization technique.
[1]
Hubertus Franke,et al.
MPI on IBM SP1/SP2: current status and future directions
,
1994,
Proceedings Scalable Parallel Libraries Conference.
[2]
Seth Copen Goldstein,et al.
Active Messages: A Mechanism for Integrated Communication and Computation
,
1992,
[1992] Proceedings the 19th Annual International Symposium on Computer Architecture.
[3]
Hiroaki Ishihata,et al.
Low-latency message communication support for the AP1000
,
1992,
ISCA '92.
[4]
Roy H. Campbell,et al.
Communication compilation for unreliable networks
,
1996,
Proceedings of 16th International Conference on Distributed Computing Systems.
[5]
William Gropp,et al.
Skjellum using mpi: portable parallel programming with the message-passing interface
,
1994
.
[6]
Satoshi Matsuoka,et al.
An Efficient Implementation Scheme of Concurrent Object-Oriented Languages on Stock Multicomputers
,
1992,
Parallel Symbolic Computing.