The Quest for a Zero Overhead Shared Memory Parallel Machine

In this paper we present a new approach to benchmark the performance of shared memory systems. This approach focuses on recognizing how far off the per- formance of a given memory system is from a realistic ideal parallel machine. We define such a realistic ma- chine model, called thez-machine, that accounts for the inherent communication costs in an application by track- ing the data flow in the application. The z-machine is incorporated into an execution-driven simulation frame- work and is used as a reference for benchmarking differ- ent memory systems. The components of the overheads in these memory systems are identified and quantified for four applications. Using the z-machine performance as the standard to strive for we discuss the implications of the performance results and suggest architectural trends to pursue for realizing a zero overhead shared memory machine.

[1]  Richard J. Anderson,et al.  On the parallel implementation of Goldberg's maximum flow algorithm , 1992, SPAA '92.

[2]  Anand Sivasubramaniam,et al.  Architectural Mechanisms for Explicit Communication in Shared Memory Multiprocessors , 1995, SC.

[3]  Anoop Gupta,et al.  The DASH Prototype: Logic Overhead and Performance , 1993, IEEE Trans. Parallel Distributed Syst..

[4]  T. Mowry,et al.  Comparative evaluation of latency reducing and tolerating techniques , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[5]  Anoop Gupta,et al.  The Stanford Dash multiprocessor , 1992, Computer.

[6]  Thomas J. LeBlanc,et al.  Parallel performance prediction using lost cycles analysis , 1994, Proceedings of Supercomputing '94.

[7]  Anoop Gupta,et al.  The performance impact of flexibility in the Stanford FLASH multiprocessor , 1994, ASPLOS VI.

[8]  Michel Dubois,et al.  Combined performance gains of simple cache protocol extensions , 1994, ISCA '94.

[9]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[10]  Anand Sivasubramaniam,et al.  A Simulation-Based Scalability Study of Parallel Systems , 1994, J. Parallel Distributed Comput..

[11]  Lawrence C. Stewart,et al.  Firefly: a multiprocessor workstation , 1987, IEEE Trans. Computers.

[12]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[13]  Anand Sivasubramaniam,et al.  An approach to scalability study of shared memory parallel systems , 1994, SIGMETRICS.

[14]  Per Stenström,et al.  Using Write Caches to Improve Performance of Cache Coherence Protocols in Shared-Memory Multiprocessors , 1995, J. Parallel Distributed Comput..

[15]  Mary K. Vernon,et al.  Efficient synchronization primitives for large-scale cache-coherent multiprocessors , 1989, ASPLOS 1989.

[16]  Per Stenström,et al.  Reducing the Write Traffic for a Hybrid Cache Protocol , 1994, 1994 International Conference on Parallel Processing Vol. 1.

[17]  James Christopher Wyllie,et al.  The Complexity of Parallel Computations , 1979 .

[18]  Anoop Gupta,et al.  Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.