论文信息 - The Quest for a Zero Overhead Shared Memory Parallel Machine

The Quest for a Zero Overhead Shared Memory Parallel Machine

In this paper we present a new approach to benchmark the performance of shared memory systems. This approach focuses on recognizing how far off the per- formance of a given memory system is from a realistic ideal parallel machine. We define such a realistic ma- chine model, called thez-machine, that accounts for the inherent communication costs in an application by track- ing the data flow in the application. The z-machine is incorporated into an execution-driven simulation frame- work and is used as a reference for benchmarking differ- ent memory systems. The components of the overheads in these memory systems are identified and quantified for four applications. Using the z-machine performance as the standard to strive for we discuss the implications of the performance results and suggest architectural trends to pursue for realizing a zero overhead shared memory machine.

[1] Richard J. Anderson,et al. On the parallel implementation of Goldberg's maximum flow algorithm , 1992, SPAA '92.

[2] Anand Sivasubramaniam,et al. Architectural Mechanisms for Explicit Communication in Shared Memory Multiprocessors , 1995, SC.

[3] Anoop Gupta,et al. The DASH Prototype: Logic Overhead and Performance , 1993, IEEE Trans. Parallel Distributed Syst..

[4] T. Mowry,et al. Comparative evaluation of latency reducing and tolerating techniques , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[5] Anoop Gupta,et al. The Stanford Dash multiprocessor , 1992, Computer.

[6] Thomas J. LeBlanc,et al. Parallel performance prediction using lost cycles analysis , 1994, Proceedings of Supercomputing '94.

[7] Anoop Gupta,et al. The performance impact of flexibility in the Stanford FLASH multiprocessor , 1994, ASPLOS VI.

[8] Michel Dubois,et al. Combined performance gains of simple cache protocol extensions , 1994, ISCA '94.

[9] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[10] Anand Sivasubramaniam,et al. A Simulation-Based Scalability Study of Parallel Systems , 1994, J. Parallel Distributed Comput..

[11] Lawrence C. Stewart,et al. Firefly: a multiprocessor workstation , 1987, IEEE Trans. Computers.

[12] Anoop Gupta,et al. SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[13] Anand Sivasubramaniam,et al. An approach to scalability study of shared memory parallel systems , 1994, SIGMETRICS.

[14] Per Stenström,et al. Using Write Caches to Improve Performance of Cache Coherence Protocols in Shared-Memory Multiprocessors , 1995, J. Parallel Distributed Comput..

[15] Mary K. Vernon,et al. Efficient synchronization primitives for large-scale cache-coherent multiprocessors , 1989, ASPLOS 1989.

[16] Per Stenström,et al. Reducing the Write Traffic for a Hybrid Cache Protocol , 1994, 1994 International Conference on Parallel Processing Vol. 1.

[17] James Christopher Wyllie,et al. The Complexity of Parallel Computations , 1979 .

[18] Anoop Gupta,et al. Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.