Dense Gaussian Networks: Suitable Topologies for On-Chip Multiprocessors

This paper explores the suitability of dense circulant graphs of degree four for the design of on-chip interconnection networks. Networks based on these graphs reduce the Torus diameter in a factor $$\sqrt{2}$$, which translates into significant performance gains for unicast traffic. In addition, they are clearly superior to Tori when managing collective communications. This paper introduces a new two-dimensional node’s labeling of the networks explored which simplifies their analysis and exploitation. In particular, it provides simple and optimal solutions to two important architectural issues: routing and broadcasting. Other implementation issues such as network folding and scalability by using hierarchical networks are also explored in this work.

[1]  Agustin Arruabarrena,et al.  Optimal Distance Networks of Low Degree for Parallel Computers , 1991, IEEE Trans. Computers.

[2]  Valentin Puente,et al.  Immunet: a cheap and robust fault-tolerant packet routing mechanism , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[3]  José L. Balcázar,et al.  Optimized mesh-connected networks for SIMD and MIMD architectures , 1987, ISCA '87.

[4]  Richard S. Bassein,et al.  An Optimization Problem , 1989 .

[5]  Philip Heidelberger,et al.  IBM Research Report Design and Analysis of the BlueGene/L Torus Interconnection Network , 2003 .

[6]  Simon W. Moore,et al.  Low-latency virtual-channel routers for on-chip networks , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[7]  Jhing-Fa Wang,et al.  Reliable circulant networks with minimum transmission delay , 1985 .

[8]  Balaram Sinharoy,et al.  POWER4 system microarchitecture , 2002, IBM J. Res. Dev..

[9]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[10]  Mateo Valero,et al.  Discrete Optimization Problem in Local Networks and Data Alignment , 1987, IEEE Transactions on Computers.

[11]  P. Smith Santa Fe, New Mexico , 1969 .

[12]  Ramón Beivide,et al.  Practicable layouts for optimal circulant graphs , 2005, 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing.

[13]  Ramón Beivide,et al.  On the perfect t-dominating set problem in circulant graphs and codes over gaussian integers , 2005, Proceedings. International Symposium on Information Theory, 2005. ISIT 2005..

[14]  Chak-Kuen Wong,et al.  A Combinatorial Problem Related to Multimodule Memory Organizations , 1974, JACM.

[15]  Z. Cvetanovic Performance analysis of the Alpha 21364-based HP GS1280 multiprocessor , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[16]  Mateo Valero,et al.  Hierarchical Topologies for Large-scale Two-level Networks , 2005 .

[17]  Yuanyuan Yang,et al.  Efficient all-to-all broadcast in all-port mesh and torus networks , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[18]  Hideharu Amano,et al.  Recursive Diagonal Torus: an interconnection network for massively parallel computers , 1993, Proceedings of 1993 5th IEEE Symposium on Parallel and Distributed Processing.

[19]  Alan J. Hu,et al.  Improving multiple-CMP systems using token coherence , 2005, 11th International Symposium on High-Performance Computer Architecture.

[20]  Luiz André Barroso,et al.  Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[21]  J.-C. Bermond,et al.  An Optimization Problem in Distributed Loop Computer Networks , 1989 .

[22]  Cruz Izu,et al.  Improving parallel system performance by changing the arrangement of the network links , 2000, ICS '00.