Pipelined All-to-All Broadcast in All-Port Meshes and Tori

All-to-all communication is one of the most dense communication patterns and occurs in many important applications in parallel computing. The authors present a new all-to-all broadcast algorithm in all-port meshes and tori. The algorithm utilizes a controlled message flooding based on a novel broadcast pattern, which ensures a balanced traffic load in all dimensions in the network so that the optimal transmission time for all-to-all broadcast can be achieved. The broadcast pattern is described in a formal, generic way for each node in terms of a few simple operations and can be easily built into router hardware. Unlike existing all-to-all broadcast algorithms, the new algorithm overlaps message switching time with transmission time in a pipelined fashion to reduce the total communication delay of all-to-all broadcast. In most cases, the total communication delay is close to the lower bound of all-to-all broadcast within a small constant range. Finally, the algorithm is conceptually simple and symmetrical for every message and every node so that it can be easily implemented in hardware and achieves the optimum in practice.

[1]  Michal Soch,et al.  Time-Optimal Gossip of Large Packets in Noncombining 2D Tori and Meshes , 1999, IEEE Trans. Parallel Distributed Syst..

[2]  F. Petrini,et al.  Total-exchange on wormhole k-ary n-cubes with adaptive routing , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[3]  Yuanyuan Yang A Class of Interconnection Networks for Multicasting , 1998, IEEE Trans. Computers.

[4]  S. Lennart Johnsson,et al.  Optimum Broadcasting and Personalized Communication in Hypercubes , 1989, IEEE Trans. Computers.

[5]  G. C. Fox,et al.  Solving Problems on Concurrent Processors , 1988 .

[6]  Sandeep K. S. Gupta,et al.  All-to-All Personalized Communication in a Wormhole-Routed Torus , 1996, IEEE Trans. Parallel Distributed Syst..

[7]  Young-Joo Suh,et al.  All-To-All Communication with Minimum Start-Up Costs in 2D/3D Tori and Meshes , 1998, IEEE Trans. Parallel Distributed Syst..

[8]  Yuanyuan Yang,et al.  Near-Optimal All-to-All Broadcast in Multidimensional All-Port Meshes and Tori , 2002, IEEE Trans. Parallel Distributed Syst..

[9]  Yu-Chee Tseng,et al.  Bandwidth-Optimal Complete Exchange on Wormhole-Routed 2D/3D Torus Networks: A Diagonal-Propagation Approach , 1997, IEEE Trans. Parallel Distributed Syst..

[10]  S. Lennart Johnsson,et al.  Communication Efficient Basic Linear Algebra Computations on Hypercube Architectures , 1987, J. Parallel Distributed Comput..

[11]  Young-Joo Suh,et al.  Efficient all-to-all personalized exchange in multidimensional torus networks , 1998, Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205).

[12]  Eli Upfal,et al.  Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems , 1997, IEEE Trans. Parallel Distributed Syst..

[13]  Dennis Gannon,et al.  On the Impact of Communication Complexity on the Design of Parallel Numerical Algorithms , 1984, IEEE Transactions on Computers.

[14]  Ben H. H. Juurlink,et al.  Gossiping on Meshes and Tori , 1998, IEEE Trans. Parallel Distributed Syst..

[15]  Yousef Saad,et al.  Data communication in parallel architectures , 1989, Parallel Comput..

[16]  Mark A. Johnson,et al.  Solving problems on concurrent processors. Vol. 1: General techniques and regular problems , 1988 .

[17]  Young-Joo Suh,et al.  Configurable Algorithms for Complete Exchange in 2D Meshes , 2000, IEEE Trans. Parallel Distributed Syst..

[18]  Fikret Erçal,et al.  Time-Efficient Maze Routing Algorithms on Reconfigurable Mesh Architectures , 1997, J. Parallel Distributed Comput..

[19]  Ulrich Meyer,et al.  Time-independent gossiping on full-port tori , 1998 .

[20]  Yuanyuan Yang,et al.  Nonblocking Broadcast Switching Networks , 1991, IEEE Trans. Computers.

[21]  Jehoshua Bruck,et al.  Efficient algorithms for all-to-all communications in multi-port message-passing systems , 1994, SPAA '94.

[22]  Rajeev Thakur,et al.  All-to-all communication on meshes with wormhole routing , 1994, Proceedings of 8th International Parallel Processing Symposium.

[23]  Stéphane Pérennes,et al.  All-to-all broadcast in torus with wormhole-like routing , 1995, Proceedings.Seventh IEEE Symposium on Parallel and Distributed Processing.

[24]  D. S. Scott,et al.  Efficient All-to-All Communication Patterns in Hypercube and Mesh Topologies , 1991, The Sixth Distributed Memory Computing Conference, 1991. Proceedings.

[25]  Yuanyuan Yang,et al.  Optimal All-to-All Personalized Exchange in a Class of Optical Multistage Networks , 2001, IEEE Trans. Parallel Distributed Syst..

[26]  Satoshi Fujita,et al.  Fast Gossiping on Mesh-Bus Computers , 1996, IEEE Trans. Computers.

[27]  Sudhakar Yalamanchili,et al.  Interconnection Networks: An Engineering Approach , 2002 .

[28]  Yuanyuan Yang,et al.  Near-optimal all-to-all broadcast in multidimensional all-port meshes and tori , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[29]  Yuanyuan Yang,et al.  Optimal All-to-All Personalized Exchange in Self-Routable Multistage Networks , 2000, IEEE Trans. Parallel Distributed Syst..