Performance Evaluation of the Slotted Ring Multiprocessor

As microprocessor speeds continue to improve at a very fast rate the bandwidth requirements for system level interconnections in multiprocessors may eventually rule out the use of shared buses even for small scale multiprocessors. On the other hand high speed unidirectional links are an emerging technology that has the potential to scale with microprocessor technology and could replace buses as the interconnection fabric for future multiprocessors. We evaluate the performance of the unidirectional slotted ring interconnection for small to medium scale shared memory systems, using a hybrid methodology of analytical models and trace driven simulations. We use memory traces from actual execution of parallel programs to drive detailed event driven simulations of a variety of ring and bus multiprocessors. Snooping and directory coherence protocols for the slotted ring are evaluated in the context of multitasking. Snooping is shown to outperform full map and linked list directory schemes in the unidirectional slotted ring, and it also compares favorably to high performance split transaction bus systems. >

[1]  David B. Gustavson The Scalable Coherent Interface and related standards projects , 1992, IEEE Micro.

[2]  Daniel A. Menascé,et al.  A Methodology for Performance Evaluation of Parallel Applications on Multiprocessors , 1992, J. Parallel Distributed Comput..

[3]  Luiz André Barroso,et al.  The performance of cache-coherent ring-based multiprocessors , 1993, ISCA '93.

[4]  Michel Dubois,et al.  Memory Access Dependencies in Shared-Memory Multiprocessors , 1990, IEEE Trans. Software Eng..

[5]  Laxmi N. Bhuyan,et al.  Approximate Analysis of Single and Multiple Ring Networks , 1989, IEEE Trans. Computers.

[6]  Per Stenström,et al.  The Cachemire Test Bench A Flexible And Effective Approach For Simulation Of Multiprocessors , 1993, [1993] Proceedings 26th Annual Simulation Symposium.

[7]  David E. Culler,et al.  Analysis of multithreaded architectures for parallel computing , 1990, SPAA '90.

[8]  Mary K. Vernon,et al.  Performance of the SCI ring , 1992, ISCA '92.

[9]  Anoop Gupta,et al.  Comparative evaluation of latency reducing and tolerating techniques , 1991, ISCA '91.

[10]  Michel Dubois,et al.  Cache Coherence on a Slotted Ring , 1991, ICPP.

[11]  Michael Stumm,et al.  Hector: a hierarchically structured shared-memory multiprocessor , 1991, Computer.

[12]  Paul Feautrier,et al.  A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.

[13]  AgarwalAnant,et al.  Directory-Based Cache Coherence in Large-Scale Multiprocessors , 1990 .

[14]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[15]  Laxmi N. Bhuyan,et al.  Analysis and Comparison of Cache Coherence Protocols for a Packet-Switched Multiprocessor , 1989, IEEE Trans. Computers.

[16]  Herb Schwetman,et al.  CSIM: a C-based process-oriented simulation language , 1986, WSC '86.

[17]  Anant Agarwal,et al.  Directory-based cache coherence in large-scale multiprocessors , 1990, Computer.

[18]  Michael Stumm,et al.  Cache consistency in hierarchical-ring-based multiprocessors , 1992, Proceedings Supercomputing '92.

[19]  G.S. Delp,et al.  Memory as a network abstraction , 1991, IEEE Network.

[20]  Herb Schwetman,et al.  Introduction to process-oriented simulation and CSIM , 1990, 1990 Winter Simulation Conference Proceedings.