Spin-lock synchronization on the Butterfly and KSR1

The drawbacks of the simple spin-lock limit its effective use to small critical sections. Applications with large critical sections and a large number of processors require more efficient algorithms to minimize processor and network overheads. Variations on the spin-lock have been tested on the Sequent Symmetry, a bus-based shared-memory multiprocessor. Algorithms for scalable synchronization have also been tested on the BBN Butterfly I, a large-scale shared-memory multiprocessor with a multistage interconnection network(MIN). We have extended the investigation to the BBN GP1000 and TC2000, both MIN-based multiprocessors with network contention heavier than that on the Butterfly I. We have also implemented algorithms on Kendall Square Research's KSR1, a hierarchical-ring multiprocessor system, to study the effects of cache coherence. The execution behavior of spin-lock algorithms is significantly different between MIN-based and HR-based architectures. Our tests suggest that HR-based architectures handle network and memory contention more efficiently than MIN-based architectures. However, our results also suggest how spin-locks can be made cost-effective on both.<<ETX>>