Optimal All-to-All Personalized Exchange in Self-Routable Multistage Networks

All-to-all personalized exchange is one of the most dense collective communication patterns and occurs in many important applications in parallel computing. Previous all-to-all personalized exchange algorithms were mainly developed for hypercube and mesh/torus networks. Although the algorithms for a hypercube may achieve optimal time complexity, the network suffers from unbounded node degrees and thus has poor scalability in terms of I/O port limitation in a processor. On the other hand, a mesh/torus has a constant node degree and better scalability in this aspect, but the all-to-all personalized exchange algorithms have higher time complexity. In this paper, we propose an alternative approach to efficient all-to-all personalized exchange by considering another important type of networks, multistage networks, for parallel computing systems. We present a new all-to-all personalized exchange algorithm for a class of unique-path, self-routable multistage networks. We first develop a generic method for decomposing all-to-all personalized exchange patterns into some permutations which are realizable in these networks, and then present a new all-to-all personalized exchange algorithm based on this method. The newly proposed algorithm has O(n) time complexity for an n/spl times/n network, which is optimal for all-to-all personalized exchange. By taking advantage of fast switch setting of self-routable switches and the property of a single input/output port per processor in a multistage network, we believe that a multistage network could be a better choice for implementing all-to-all personalized exchange due to its shorter communication latency and better scalability.

[1]  Yuanyuan Yang,et al.  Efficient all-to-all broadcast in all-port mesh and torus networks , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[2]  T. H. Meyer Computer Architecture and Organization , 1982 .

[3]  S. Lennart Johnsson,et al.  Optimum Broadcasting and Personalized Communication in Hypercubes , 1989, IEEE Trans. Computers.

[4]  Prasant Mohapatra,et al.  Tree-based multicasting on wormhole routed multistage interconnection networks , 1997, Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162).

[5]  Dennis Gannon,et al.  On the Impact of Communication Complexity on the Design of Parallel Numerical Algorithms , 1984, IEEE Transactions on Computers.

[6]  Yousef Saad,et al.  Data communication in parallel architectures , 1989, Parallel Comput..

[7]  V. E. Benes,et al.  Heuristic remarks and mathematical problems regarding the theory of connecting systems , 1962 .

[8]  Dhabaleswar K. Panda,et al.  Efficient broadcast and multicast on multistage interconnection networks using multiport encoding , 1996, Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing.

[9]  Rajeev Thakur,et al.  All-to-all communication on meshes with wormhole routing , 1994, Proceedings of 8th International Parallel Processing Symposium.

[10]  Sandeep K. S. Gupta,et al.  All-to-All Personalized Communication in a Wormhole-Routed Torus , 1996, IEEE Trans. Parallel Distributed Syst..

[11]  Young-Joo Suh,et al.  All-To-All Communication with Minimum Start-Up Costs in 2D/3D Tori and Meshes , 1998, IEEE Trans. Parallel Distributed Syst..

[12]  D. S. Scott,et al.  Efficient All-to-All Communication Patterns in Hypercube and Mesh Topologies , 1991, The Sixth Distributed Memory Computing Conference, 1991. Proceedings.

[13]  Dhabaleswar K. Panda,et al.  Efficient Broadcast and Multicast on Multistage Interconnection Networks Using Multiport Encoding , 1998, IEEE Trans. Parallel Distributed Syst..

[14]  Tse-Yun Feng,et al.  On a Class of Multistage Interconnection Networks , 1980, IEEE Transactions on Computers.

[15]  Dhabaleswar K. PandaDept Issues in Designing Eecient and Practical Algorithms for Collective Communication on Wormhole-routed Systems , 1995 .

[16]  Sartaj Sahni,et al.  A Self-Routing Benes Network and Parallel Permutation Algorithms , 1981, IEEE Transactions on Computers.

[17]  Yuanyuan Yang A Class of Interconnection Networks for Multicasting , 1998, IEEE Trans. Computers.

[18]  Kenneth P. Bogart,et al.  Introductory Combinatorics , 1977 .

[19]  Young-Joo Suh,et al.  Efficient all-to-all personalized exchange in multidimensional torus networks , 1998, Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205).

[20]  Hong Xu,et al.  Optimal software multicast in wormhole-routed multistage networks , 1994, Supercomputing '94.

[21]  Charles Clos,et al.  A study of non-blocking switching networks , 1953 .

[22]  Fikret Erçal,et al.  Time-Efficient Maze Routing Algorithms on Reconfigurable Mesh Architectures , 1997, J. Parallel Distributed Comput..

[23]  G. Jack Lipovski,et al.  Banyan networks for partitioning multiprocessor systems , 1998, ISCA '98.

[24]  Yu-Chee Tseng,et al.  Bandwidth-Optimal Complete Exchange on Wormhole-Routed 2D/3D Torus Networks: A Diagonal-Propagation Approach , 1997, IEEE Trans. Parallel Distributed Syst..

[25]  S. Lennart Johnsson,et al.  Communication Efficient Basic Linear Algebra Computations on Hypercube Architectures , 1987, J. Parallel Distributed Comput..

[26]  Yuanyuan Yang,et al.  Nonblocking Broadcast Switching Networks , 1991, IEEE Trans. Computers.

[27]  Sudhakar Yalamanchili,et al.  Interconnection Networks: An Engineering Approach , 2002 .