An Efficient All-to-all Communication Algorithm for Mesh/Torus Networks

An efficient all-to-all communication algorithm for torus and mesh networks, A2AT, was proposed. A2AT schedules message sending sequence so that all links are fully used by exploiting function of concurrent message transfer in the node. By using A2AT, the hop count of messages equals the maximum number of messages sharing a link in their routes for all message transfers. A2AT can therefore maintain synchronization without the need for phasing operation such as an MPI barrier. When the VOQ which is an ideal configuration for A2AT was used, communication times for mesh/torus network obtained by A2AT were roughly 1.20 and 1.09 times higher, on average, than those of the ideal times. When the networks had the minimum number of virtual channels and a small buffer, assuming a practical network, A2AT was able to reduce communication times by 12.5% and 36.0% compared with those of the conventional algorithm. When two controllers are used, A2AT reduced 28.2% and 55.7% communication time with those by A2AND on 15×15×15 (=3,375 nodes) mesh and torus networks respectively (18.6% and 44.8% in average). A2AT also reduced 15.1% and 41.9% of communication time with those by A2AND on the same mesh and torus networks respectively (14.4% and 37.5% in average) when six controllers are used.

[1]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[2]  Philip Heidelberger,et al.  Optimization of All-to-All Communication on the Blue Gene/L Supercomputer , 2008, 2008 37th International Conference on Parallel Processing.

[3]  Toshiyuki Shimizu,et al.  Tofu: A 6D Mesh/Torus Interconnect for Exascale Computers , 2009, Computer.

[4]  G. Johnson,et al.  A Performance Comparison Through Benchmarking and Modeling of Three Leading Supercomputers: Blue Gene/L, Red Storm, and Purple , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[5]  Philip Heidelberger,et al.  Optimization of MPI collective communication on BlueGene/L systems , 2005, ICS '05.

[6]  R. Barrett,et al.  Early Evaluation of the Cray XT 5 ∗ , 2009 .

[7]  Philip Heidelberger,et al.  Blue Gene/L torus interconnection network , 2005, IBM J. Res. Dev..

[8]  林 憲一,et al.  All-to-All Personalized Communication on a Wraparound Mesh , 1992 .

[9]  D. S. Scott,et al.  Efficient All-to-All Communication Patterns in Hypercube and Mesh Topologies , 1991, The Sixth Distributed Memory Computing Conference, 1991. Proceedings.

[10]  Jarek Nieplocha,et al.  Optimizing All-to-All Collective Communication by Exploiting Concurrency in Modern Networks , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[11]  José Duato,et al.  Efficient, Scalable Congestion Management for Interconnection Networks , 2006, IEEE Micro.

[12]  Yasushi Negishi,et al.  Overlapping Methods of All-to-All Communication and FFT Algorithms for Torus-Connected Massively Parallel Supercomputers , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  Sandeep K. S. Gupta,et al.  All-to-All Personalized Communication in a Wormhole-Routed Torus , 1996, IEEE Trans. Parallel Distributed Syst..

[14]  P.H. Worley,et al.  Early Evaluation of the Cray X1 , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[15]  Jehoshua Bruck,et al.  Efficient algorithms for all-to-all communications in multi-port message-passing systems , 1994, SPAA '94.