Impact of switch design on the application performance of cache-coherent multiprocessors

The effect of switch design on the application performance of cache-coherent non-uniform memory access (CC-NUMA) multiprocessors is studied in detail. Wormhole routing and cut-through switching are evaluated for these shared-memory multiprocessors that employ an multistage interconnection network (MIN) and full map directory-based cache coherence protocol. The switch design also considers virtual channels and varying number of input buffers per switch. Based on this, four different switch architectures are presented and compared. The evaluation is based on execution-driven simulation using five different applications to capture the random bursty nature of the network traffic arrival. The round-robin memory management policy is implemented. The authors show that the use of cut-through switching with buffers and virtual channels improves the average message latency tremendously. The waiting times of messages at various stages of switches are also presented. Finally, they show the variation of stall times and execution times for these applications by varying the switch delay and wire width.

[1]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[2]  Yuval Tamir,et al.  Symmetric Crossbar Arbiters for VLSI Communication Switches , 1993, IEEE Trans. Parallel Distributed Syst..

[3]  Mohan Kumar,et al.  Performance of Multistage Bus Networks for a Distributed Shared Memory Multiprocessor , 1997, IEEE Trans. Parallel Distributed Syst..

[4]  Anoop Gupta,et al.  Operating system support for improving data locality on CC-NUMA compute servers , 1996, ASPLOS VII.

[5]  Paul Feautrier,et al.  A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.

[6]  Josep Torrellas,et al.  The performance of the cedar multistage switching network , 1997, Supercomputing '94.

[7]  William J. Dally Virtual-channel flow control , 1990, ISCA '90.

[8]  Chita R. Das,et al.  Performance benefits of virtual channels and adaptive routing: an application-driven study , 1997, ICS '97.

[9]  Sudhakar Yalamanchili,et al.  Adaptive routing protocols for hypercube interconnection networks , 1993, Computer.

[10]  Laxmi N. Bhuyan,et al.  Evaluating virtual channels for cache-coherent shared-memory multiprocessors , 1996, ICS '96.

[11]  Anoop Gupta,et al.  The DASH Prototype: Logic Overhead and Performance , 1993, IEEE Trans. Parallel Distributed Syst..

[12]  Eric A. Brewer,et al.  PROTEUS: a high-performance parallel-architecture simulator , 1992, SIGMETRICS '92/PERFORMANCE '92.

[13]  Michael L. Scott,et al.  Synchronization without contention , 1991, ASPLOS IV.