Performance and configuration of hierarchical ring networks for multiprocessors

Analytical queueing network models for expected message delay in 2-level and 3-level hierarchical-ring interconnection networks (INs) are developed. Such networks have recently been used in commercial and research prototype multiprocessors. A major class of traffic carried by these INs consists of cache line transfers, and associated coherency control messages, between processor caches and remote memory modules in shared-memory multiprocessors. Memory modules are assumed to be evenly distributed over the processor nodes. Such traffic consists of short, fixed-length messages. They can be conveniently transported using the slotted ring transmission technique, which is studied here. The message delay results derived from the models are shown to be quite accurate when checked against a simulation study. The comparisons to simulations include heavy traffic situations where queueing delays in ring crossover switches are significant for ring utilization levels of 80 to 90%. As well as facilitating analysis, the analytical models can be used to determine optimal sizes for the rings at different levels in the hierarchy under specified traffic distributions in a system with a given total number of processor nodes. Optimality is in terms of minimizing average message delay. A specific example of such a design exercise is provided for the uniform traffic case.

[1]  Wayne M. Loucks,et al.  Short-Packet Transfer Performance in Local Area Ring Networks , 1985, IEEE Transactions on Computers.

[2]  Bruce W. Char,et al.  Maple V Language Reference Manual , 1993, Springer US.

[3]  Terence D. Todd,et al.  Performance modeling of the SIGnet MAN backbone , 1990, Proceedings. IEEE INFOCOM '90: Ninth Annual Joint Conference of the IEEE Computer and Communications Societies@m_The Multiple Facets of Integration.

[4]  Michael Stumm,et al.  Performance Evaluation of Hierarchical Ring-Based Shared Memory Multiprocessors , 1994, IEEE Trans. Computers.

[5]  Peter J. B. King,et al.  Modeling a Slotted Ring Local Area Network , 1987, IEEE Transactions on Computers.

[6]  Laxmi N. Bhuyan,et al.  Approximate Analysis of Single and Multiple Ring Networks , 1989, IEEE Trans. Computers.

[7]  Michael Stumm,et al.  Hector: a hierarchically structured shared-memory multiprocessor , 1991, Computer.

[8]  Hendrik A. Goosen,et al.  Paradigm: a highly scalable shared-memory multicomputer architecture , 1991, Computer.

[9]  Hong Jiang,et al.  Comparison of Mesh and Hierarchical Networks for Multiprocessors , 1994, 1994 International Conference on Parallel Processing Vol. 1.

[10]  Andrew W. Wilson,et al.  Hierarchical cache/bus architecture for shared memory multiprocessors , 1987, ISCA '87.

[11]  Hong Jiang,et al.  On some architectural issues of optical hierarchical ring networks for shared-memory multiprocessors , 1995, Proceedings of Second International Workshop on Massively Parallel Processing Using Optical Interconnections.

[12]  T. H. Dunigan Multi-ring performance of the Kendall square multiprocessor , 1994 .

[13]  Michael Stumm,et al.  A performance comparison of hierarchical ring- and mesh-connected multiprocessor networks , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[14]  Bruce W. Char,et al.  Maple V Library Reference Manual , 1992, Springer New York.

[15]  Clement W. H. Lam,et al.  Design and Analysis of Hierarchical Ring Networks for Shared-Memory Multiprocessors , 1995, ICPP.

[16]  Yong Yan,et al.  Comparative Modeling and Evaluation of CC-NUMA and COMA on Hierarchical Ring Architectures , 1995, IEEE Trans. Parallel Distributed Syst..