Message passing vs. shared address space on a cluster of SMPs

The emergence of scalable computer architectures using clusters of PCs (or PC-SMPs) with commodity networking has made them attractive platforms for high-end scientific computing. Currently, message passing (MP) and shared address space (SAS) are the two leading programming paradigms for these systems. MP has been standardized with MPI, and is the most common and mature parallel programming approach. However, MP code development can be extremely difficult, especially for irregularly structured computations. SAS offers substantial ease of programming, but may suffer from performance limitations due to poor spatial locality and high protocol overhead. In this paper, we compare the performance of and programming effort required for six applications under both programming models on a 32-CPU PC-SMP cluster. Our application suite consists of codes that typically do not exhibit scalable performance under shared-memory programming due to their high communication-to-computation ratios and complex communication patterns. Results indicate that SAS can achieve about half the parallel efficiency of MPI for most of our applications; however on certain classes of problems, SAS performance is competitive with MPI.

[1]  Cheng Liao,et al.  Using network interface support to avoid asynchronous protocol processing in shared virtual memory systems , 1999, ISCA.

[2]  Steve Sistare,et al.  Optimization of MPI Collectives on Clusters of Large-Scale SMP's , 1999, SC.

[3]  Marc Levoy,et al.  Parallel visualization algorithms: performance and architectural implications , 1994, Computer.

[4]  James R. Larus,et al.  Fine-grain access control for distributed shared memory , 1994, ASPLOS VI.

[5]  Leonid Oliker,et al.  A Comparison of Three Programming Models for Adaptive Applications on the Origin2000 , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[6]  Xiang Yu,et al.  Application scaling under shared virtual memory on a cluster of SMPs , 1999, ICS '99.

[7]  Jaswinder Pal Singh,et al.  A comparison of MPI, SHMEM and cache-coherent shared address space programming models on the SGI Origin2000 , 1999, ICS '99.

[8]  Kourosh Gharachorloo,et al.  Shasta: a low overhead, software-only approach for supporting fine-grain shared memory , 1996, ASPLOS VII.

[9]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[10]  J.P. Singh,et al.  Scaling application performance on a cache-coherent multiprocessors , 1999, Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367).

[11]  Jaswinder Pal Singh,et al.  Scaling application performance on a cache-coherent multiprocessor , 1999, ISCA.

[12]  Hongzhang Shan,et al.  Parallel Sorting on Cache-coherent DSM Multiprocessors , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[13]  J. Larus,et al.  Tempest and Typhoon: user-level shared memory , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[14]  Paul Hudak,et al.  Memory coherence in shared virtual memory systems , 1986, PODC '86.

[15]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[16]  Alan L. Cox,et al.  Quantifying the Performance Differences between PVM and TreadMarks , 1997, J. Parallel Distributed Comput..

[17]  Mats Brorsson,et al.  A Comparative Characterization of Communication Patterns in Applications Using MPI and Shared Memory on an IBM SP2 , 1998, CANPC.

[18]  GuptaAnoop,et al.  Parallel Visualization Algorithms , 1994 .

[19]  Jaswinder Pal Singh,et al.  Parallel tree building on a range of shared address space multiprocessors: algorithms and application performance , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[20]  Cezary Dubnicki,et al.  VMMC-2 : Efficient Support for Reliable, Connection-Oriented Communication , 1997 .