We describe an implementation of a sizable subset of OpenMP on networks of workstations (NOWs). By extending the availability of OpenMP to NOWs, we overcome one of its primary drawbacks compared to MPI, namely lack of portability to environments other than hardware shared memory machines. In order to support OpenMP execution on NOWs, our compiler targets a software distributed shared memory system (DSM) which provides multi-threaded execution and memory consistency. This paper presents two contributions. First, we identify two aspects of the current OpenMP standard that make an implementation on NOWs hard, and suggest simple modifications to the standard that remedy the situation. These problems reflect differences in memory architecture between software and hardware shared memory and the high cost of synchronization on NOWs. Second, we present performance results of a prototype implementation of an OpenMP subset on a NOW, and compare them with hand-coded software DSM and MPI results for the same applications on the same platform. We use five applications (ASCI Sweep3d, NAS 3D- FFT, SPLASH-2 Water, QSORT, and TSP) exhibiting various styles of parallelization, including pipelined execution, data parallelism, coarse-grained parallelism, and task queues. The measurements show little difference between OpenMP and hand-coded software DSM, but both are still lagging behind MPI. Further work will concentrate on compiler optimization to reduce these differences.
[1]
Alan L. Cox,et al.
An integrated compile-time/run-time software distributed shared memory system
,
1996,
ASPLOS VII.
[2]
Chau-Wen Tseng,et al.
Enhancing software DSM for compiler-parallelized applications
,
1997,
Proceedings 11th International Parallel Processing Symposium.
[3]
David R. Butenhof.
Programming with POSIX threads
,
1993
.
[4]
Alan L. Cox,et al.
Quantifying the Performance Differences between PVM and TreadMarks
,
1997,
J. Parallel Distributed Comput..
[5]
Abraham Silberschatz,et al.
Operating System Concepts
,
1983
.
[6]
Alan L. Cox,et al.
TreadMarks: shared memory computing on networks of workstations
,
1996
.
[7]
Anoop Gupta,et al.
The SPLASH-2 programs: characterization and methodological considerations
,
1995,
ISCA.
[8]
Alan L. Cox,et al.
Evaluating the performance of software distributed shared memory as a target for parallelizing compilers
,
1997,
Proceedings 11th International Parallel Processing Symposium.
[9]
Saman Amarasinghe,et al.
The suif compiler for scalable parallel machines
,
1995
.