Computation/communication balance-point modeling in multiprocessors

An analytic model for predicting processor utilization in a CC-NUMA (cache coherent non-uniform memory access) shared-memory multiprocessor is developed. The interconnection network in such systems transfers cache line messages between processor caches and memory modules on read and write misses. The major component of the miss penalty, for the case of large systems, is the network delay. Using only a relatively small number of node parameters (cache miss rate, cache line length, number of outstanding transfer requests allowed, memory access time, proportion of Reads to Writes), along with the bandwidth and delay versus throughput characteristics of the network, the analytic model is shown to give good estimates of the processor utilization values derived from an independent detailed simulation study. In particular, for multiprocessor sizes of 72 and 108 nodes, and for variations in the node parameters, the processor utilization values determined by the analytic model are within 10% of the simulation results. Processor utilizations vary from 0.41 to 0.88. The interconnection network involved is a hierarchical slotted-ring system.

[1]  Guy Lemieux,et al.  The NUMAchine multiprocessor , 2000, Proceedings 2000 International Conference on Parallel Processing.

[2]  Michael Stumm,et al.  A performance comparison of hierarchical ring- and mesh-connected multiprocessor networks , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[3]  Hong Jiang,et al.  Performance and configuration of hierarchical ring networks for multiprocessors , 1997, Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162).