Efficient Scheduling of Complete Exchange on Clusters

One of the performance limitations of clusters is their message passing capability, while complete exchange is known to be the severest communication pattern on all types of message passing machines. In this paper, we focus on the practical issues of designing high-speed complete exchange algorithms on a commodity cluster interconnected by a non-blocking crossbar switch. Four complete exchange algorithms, including, shift exchange, pairwise exchange, group shuffle exchange and synchronous shuffle exchange algorithms are implemented and tested on a cluster platform. To avoid node and link contention, these algorithms schedule the communication at the packet-level. They aim at fully utili zing the available communication bandwidth both in the links and switch and avoiding the Head-Of-Line problem which would stall the pipelines and decrease the overall efficiency. Both the analytical and measured results show that the synchronous shuffle exchange algorithm can achieve the best performance. It can reach 97% of the available bandwidth in our tests; while the group shuffle exchange performs almost as good as the synchronous shuffle exchange algorithm but scales better when it works on an input-buffered switch. Both shift and pairwise exchanges become inefficient when exchanging small messages, as they cannot fill up those network pipelines effectively. Performance studies of the four algorithms on both input-buffered and shared-buffered switches are also reported.

[1]  D. S. Scott,et al.  Efficient All-to-All Communication Patterns in Hypercube and Mesh Topologies , 1991, The Sixth Distributed Memory Computing Conference, 1991. Proceedings.

[2]  Vassilios V. Dimakopoulos,et al.  A Theory for Total Exchange in Multidimensional Interconnection Networks , 1998, IEEE Trans. Parallel Distributed Syst..

[3]  Shahid H. Bokhari Multiphase Complete Exchange: A Theoretical Analysis , 1996, IEEE Trans. Computers.

[4]  Shahid H. Bokhari,et al.  Multiphase complete exchange on Paragon, SP2, and CS-2 , 1996, IEEE Parallel Distributed Technol. Syst. Appl..

[5]  Cho-Li Wang,et al.  Directed Point: An Efficient Communication Subsystem for Cluster Computing , 1998 .

[6]  Cho-Li Wang,et al.  Realistic communication model for parallel computing on cluster , 1999, ICWC 99. IEEE Computer Society International Workshop on Cluster Computing.

[7]  John N. Tsitsiklis,et al.  Optimal Communication Algorithms for Hypercubes , 1991, J. Parallel Distributed Comput..

[8]  Shahid H. Bokhari,et al.  Balancing contention and synchronization on the Intel Paragon , 1997, IEEE Concurrency.

[9]  Sandeep K. S. Gupta,et al.  All-to-All Personalized Communication in a Wormhole-Routed Torus , 1996, IEEE Trans. Parallel Distributed Syst..

[10]  Young-Joo Suh,et al.  All-To-All Communication with Minimum Start-Up Costs in 2D/3D Tori and Meshes , 1998, IEEE Trans. Parallel Distributed Syst..

[11]  Geoffrey C. Fox,et al.  Scheduling regular and irregular communication patterns on the CM-5 , 1992, Proceedings Supercomputing '92.

[12]  Kenneth R. Jackson,et al.  An Efficient Transposition Algorithm for Distributed Memory Computers , 2002 .

[13]  Bernard Harris,et al.  Graph theory and its applications , 1970 .

[14]  Jehoshua Bruck,et al.  Efficient algorithms for all-to-all communications in multi-port message-passing systems , 1994, SPAA '94.

[15]  Yu-Chee Tseng,et al.  Bandwidth-Optimal Complete Exchange on Wormhole-Routed 2D/3D Torus Networks: A Diagonal-Propagation Approach , 1997, IEEE Trans. Parallel Distributed Syst..

[16]  Chi-Chung Lam,et al.  Optimal Algorithms for All-to-All Personalized Communication on Rings and Two Dimensional Tori , 1997, J. Parallel Distributed Comput..

[17]  Bin Zhou,et al.  A performance comparison of buffering schemes for multistage switches , 1995, Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing.

[18]  Dhabaleswar K. PandaDept Issues in Designing Eecient and Practical Algorithms for Collective Communication on Wormhole-routed Systems , 1995 .