Hot spot analysis in large scale shared memory multiprocessors

Scalable multiprocessors that support a shared-memory image to application programmers are typically based on physical memory modules that are distributed. Consequently, the access times for a particular processor to various parts of physical memory differ. The authors explore the implications of this nonuniformity in memory access times. In particular, the study the effect of hot-spots in hierarchical large scale NUMA multiprocessors. They have developed an analytical model of access latencies and contention for shared resources in the interconnection network that links the processors and memory modules. The objective is to provide a better understanding of nonuniform memory access times in scalable architectures. They show the extent to which a variable can be shared before it becomes a performance bottleneck, and assess the potential gain from replication of shared data items. They also demonstrate that the backoff value (after a memory request rejection) must be chosen carefully to balance memory access time and network utilization and that memory utilization is improved by allowing memory request buffering.

[1]  A. Agarwal,et al.  Adaptive backoff synchronization techniques , 1989, ISCA '89.

[2]  Michael L. Scott,et al.  Evaluation of Multiprocessor Memory Systems Using Off-Line Optimal Behavior , 1991, J. Parallel Distributed Comput..

[3]  Xiaodong Zhang,et al.  Performance Prediction and Evaluation of Parallel Processing on a NUMA Multiprocessor , 1991, IEEE Trans. Software Eng..

[4]  Michael Stumm,et al.  Hector: a hierarchically structured shared-memory multiprocessor , 1991, Computer.

[5]  Anoop Gupta,et al.  The directory-based cache coherence protocol for the DASH multiprocessor , 1990, ISCA '90.

[6]  Robert J. Fowler,et al.  NUMA policies and their relation to memory architecture , 1991, ASPLOS IV.

[7]  Carla Schlatter Ellis,et al.  An analysis of dynamic page placement on a NUMA multiprocessor , 1992, SIGMETRICS '92/PERFORMANCE '92.

[8]  Kenneth C. Sevcik,et al.  Evaluating memory system performance of a large scale NUMA multiprocessor , 1994, Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[9]  Kenneth C. Sevcik,et al.  Performance Benefits and Limitations of Large NUMA Multiprocessors , 1994, Perform. Evaluation.

[10]  Michael Stumm,et al.  Performance Evaluation of Hierarchical Ring-Based Shared Memory Multiprocessors , 1994, IEEE Trans. Computers.

[11]  Lionel M. Ni,et al.  Resource Contention in Shared-Memory Multiprocessors: A Parameterized Performance Degradation Model , 1991, J. Parallel Distributed Comput..

[12]  Anoop Gupta,et al.  Comparative evaluation of latency reducing and tolerating techniques , 1991, ISCA '91.

[13]  Thomas H. Dunigan KENDALL SQUARE MULTIPROCESSOR: EARLY EXPERIENCES AND PERFORMANCE , 1992 .

[14]  Thomas E. Anderson,et al.  The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors , 1990, IEEE Trans. Parallel Distributed Syst..

[15]  Anant Agarwal,et al.  APRIL: a processor architecture for multiprocessing , 1990, ISCA '90.