SymSig: A low latency interconnection topology for HPC clusters

This paper presents the underlying theory and the performance of a cluster using a new 2-hop network topology. This topology is constructed using a symmetric equation and Singer Difference Sets and is called SymSig. The degree of connections at each node with SymSig is about half compared to previous methods using Singer Difference Sets. A comparison with a cluster of Clos topology shows significant advantages. The worst case congestion in SymSig topology for unicast permutation is 2, where as in Clos it is proportional to the radix of the building block switches used. The number of switches required is smaller by about 25%, the size of the cluster is larger by about 15% and the worst bandwidth is better by about 50% for SymSig. These advantages are retained for peta and exascale systems. Its performance on a set of collectives like exchange-all, shift-all, broadcast-all and all-to-all send/receive shows improvements ranging from 39% to 83%. Its performance on a molecular dynamics application GROMMACS shows improvement of upto 33%. This network is particularly suitable for applications that require global all to all communications. The low latency of this network makes it scaleable and an attractive alternative for building peta and exascale systems.

[1]  P. F. Corbett,et al.  Rotator Graphs: An Efficient Topology for Point-to-Point Multiprocessor Networks , 1992, IEEE Trans. Parallel Distributed Syst..

[2]  Sheldon B. Akers,et al.  The Star Graph: An Attractive Alternative to the n-Cube , 1994, ICPP.

[3]  Sheldon B. Akers,et al.  A Group-Theoretic Model for Symmetric Interconnection Networks , 1989, IEEE Trans. Computers.

[4]  Krishnan Padmanabhan,et al.  Performance of the Direct Binary n-Cube Network for Multiprocessors , 1989, IEEE Trans. Computers.

[5]  Charles Clos,et al.  A study of non-blocking switching networks , 1953 .

[6]  Narendra Karmarkar A new parallel architecture for sparse matrix computation based on finite projective geometries , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[7]  Rami G. Melhem,et al.  Low Diameter Interconnections for Routing in High-Performance Parallel Systems , 2007, IEEE Transactions on Computers.

[8]  Charles E. Leiserson,et al.  Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.

[9]  J. Singer A theorem in finite projective geometry and some applications to number theory , 1938 .

[10]  Behrooz Parhami,et al.  Performance, algorithmic, and robustness attributes of perfect difference networks , 2005, IEEE Transactions on Parallel and Distributed Systems.

[11]  Sudhakar Yalamanchili,et al.  Interconnection Networks: An Engineering Approach , 2002 .

[12]  William J. Dally,et al.  Flattened butterfly: a cost-efficient topology for high-radix networks , 2007, ISCA '07.

[13]  Behrooz Parhami,et al.  Perfect difference networks and related interconnection structures for parallel and distributed systems , 2005, IEEE Transactions on Parallel and Distributed Systems.

[14]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .