Torus Ring: improving performance of interconnection network by modifying hierarchical ring

In multiprocessor systems, interconnection network design is critical for overall system performance. Among the popular interconnection networks, unidirectional ring-based networks have been one of popular choices for high performance large-scale shared memory multiprocessor systems. In this paper, we propose ''Torus Ring'', which is a modified version of two-level hierarchical ring. The Torus Ring has the same complexity as the hierarchical rings, and the only difference is the way it connects the local rings. Compared to hierarchical rings, the Torus Ring helps exploit the memory access locality of application programs more efficiently. It has an advantage over the hierarchical ring when the destination of a packet is the adjacent local ring, especially the backward adjacent local ring. Although we assume that the destination of a network packet is uniformly distributed across the processing nodes, the average number of hops in Torus Ring is equal to that of the hierarchical ring. However, the performance gain of the Torus Ring is expected to increase, due to the memory access locality of the application programs in the real parallel programming environment. In the simulation results, the latency of the interconnection network is reduced by up to 19% and the execution time is reduced by up to 10%, with the moderate ring utilization ratio.

[1]  Guy Lemieux,et al.  The NUMAchine multiprocessor , 2000, Proceedings 2000 International Conference on Parallel Processing.

[2]  William J. Dally Virtual-Channel Flow Control , 1992, IEEE Trans. Parallel Distributed Syst..

[3]  Hong Jiang,et al.  Hierarchical Ring Network Configuration and Performance Modeling , 2001, IEEE Trans. Computers.

[4]  Clement W. H. Lam,et al.  Design and Analysis of Hierarchical Ring Networks for Shared-Memory Multiprocessors , 1995, ICPP.

[5]  Michael Stumm,et al.  Scalable cache consistency for hierarchically structured multiprocessors , 2005, The Journal of Supercomputing.

[6]  Hong Jiang,et al.  Performance and configuration of hierarchical ring networks for multiprocessors , 1997, Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162).

[7]  Anoop Gupta,et al.  The DASH prototype: implementation and performance , 1992, ISCA '92.

[8]  Anoop Gupta,et al.  Parallel computer architecture - a hardware / software approach , 1998 .

[9]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[10]  Hong Jiang,et al.  Comparison of Mesh and Hierarchical Networks for Multiprocessors , 1994, 1994 International Conference on Parallel Processing Vol. 1.

[11]  Hong Jiang,et al.  On some architectural issues of optical hierarchical ring networks for shared-memory multiprocessors , 1995, Proceedings of Second International Workshop on Massively Parallel Processing Using Optical Interconnections.

[12]  T. Lovett,et al.  STiNG: A CC-NUMA Computer System for the Commercial Marketplace , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[13]  Michael Stumm,et al.  Performance Evaluation of Hierarchical Ring-Based Shared Memory Multiprocessors , 1994, IEEE Trans. Computers.

[14]  Michael Stumm,et al.  On topology and bisection bandwidth of hierarchical-ring networks for shared-memory multiprocessors , 1998, Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238).

[15]  Sung Woo Chung,et al.  DRACO: optimized CC-NUMA system with novel dual-link interconnections to reduce the memory latency , 2005, SIGARCH Comput. Archit. News.

[16]  Sung Woo Chung,et al.  Efficient schemes to scale the interconnection network bandwidth in a ring-based multiprocessor system , 2001, SAC.

[17]  Zhiwei Xu,et al.  Scalable Parallel Computing: Technology, Architecture, Programming , 1998 .

[18]  Michael Stumm,et al.  Hector: a hierarchically structured shared-memory multiprocessor , 1991, Computer.

[19]  Michael Stumm,et al.  Performance issues in the design of hierarchical-ring and direct networks for shared-memory multiprocessors , 1998 .

[20]  Guy Lemieux,et al.  Design and implementation of the NUMAchine multiprocessor , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).

[21]  Michel Dubois,et al.  Performance Evaluation of the Slotted Ring Multiprocessor , 1995, IEEE Trans. Computers.

[22]  Charles E. Leiserson,et al.  Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.

[23]  N. Manjikian Prototyping a hierarchical ring interconnect for system-on-chip multiprocessor implementations , 2004, The 2nd Annual IEEE Northeast Workshop on Circuits and Systems, 2004. NEWCAS 2004..

[24]  Michael Stumm,et al.  A performance comparison of hierarchical ring- and mesh-connected multiprocessor networks , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[25]  Yong Yan,et al.  Comparative Modeling and Evaluation of CC-NUMA and COMA on Hierarchical Ring Architectures , 1995, IEEE Trans. Parallel Distributed Syst..

[26]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.