Message scheduling for all-to-all personalized communication on ethernet switched clusters

We develop a message scheduling scheme that can theoretically achieve the maximum throughput for all-to-all personalized communication (AAPC) on any given Ethernet switched cluster. Based on the scheduling scheme, we implement an automatic routine generator that takes the topology information as input and produces a customized MPI/spl I.bar/Alltoall routine, a routine in the Message Passing Interface (MPI) standard that realizes AAPC. Experimental results show that the automatically generated routine consistently out-performs other MPLAlltoall algorithms, including those in LAM/MPI and MPICH, on Ethernet switched clusters with different network topologies when the message size is sufficiently large. This demonstrates the superiority of the proposed AAPC algorithm in exploiting network band-widths.

[1]  Rami G. Melhem,et al.  Algorithms for Supporting Compiled Communication , 2003, IEEE Trans. Parallel Distributed Syst..

[2]  Andrew S. Tanenbaum,et al.  Computer Networks , 1981 .

[3]  Henry G. Dietz,et al.  PAPERS: Purdue's Adapter for Parallel Execution and Rapid synchronization , 1994 .

[4]  Shahid H. Bokhari Multiphase Complete Exchange: A Theoretical Analysis , 1996, IEEE Trans. Computers.

[5]  Cho-Li Wang,et al.  Contention-Aware Communication Schedule for High-Speed Communication , 2003, Cluster Computing.

[6]  David R. O'Hallaron,et al.  An architecture for optimal all-to-all personalized communication , 1994, SPAA '94.

[7]  Henry G. Dietz,et al.  Purdue’s Adapter for Parallel Execution and Rapid Synchronization: The TTL_PAPERS Design , 1995 .

[8]  Ran Libeskind-Hadas,et al.  Optimal contention-free unicast-based multicasting in switch-based networks of workstations , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[9]  Alex Rapaport,et al.  Mpi-2: extensions to the message-passing interface , 1997 .

[10]  Viktor K. Prasanna,et al.  Portable and Scalable Algorithm for Irregular All-to-All Communication , 2002, J. Parallel Distributed Comput..

[11]  Chao Lin,et al.  Heuristic Contention-Free Broadcast in Heterogeneous Networks of Workstations with Multiple Send and Receive Speeds , 2003, The Journal of Supercomputing.

[12]  Cho-Li Wang,et al.  Efficient Scheduling of Complete Exchange on Clusters , 2000 .

[13]  Geoffrey C. Fox,et al.  Scheduling regular and irregular communication patterns on the CM-5 , 1992, Proceedings Supercomputing '92.

[14]  Rajeev Thakur,et al.  Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..

[15]  D. S. Scott,et al.  Efficient All-to-All Communication Patterns in Hypercube and Mesh Topologies , 1991, The Sixth Distributed Memory Computing Conference, 1991. Proceedings.

[16]  Xin Yuan,et al.  Automatic generation and tuning of MPI collective communication routines , 2005, ICS '05.

[17]  Vassilios V. Dimakopoulos,et al.  Communications in Binary Fat Trees , 2005 .

[18]  Viktor K. Prasanna,et al.  Portable and scalable algorithms for irregular all-to-all communication , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[19]  Rajeev Thakur,et al.  All-to-all communication on meshes with wormhole routing , 1994, Proceedings of 8th International Parallel Processing Symposium.

[20]  Emmanouel A. Varvarigos,et al.  Communication algorithms for isotropic tasks in hypercubes and wraparound meshes , 1992, Parallel Comput..

[21]  Xin Yuan,et al.  An MPI prototype for compiled communication on Ethernet switched clusters , 2005, J. Parallel Distributed Comput..

[22]  Dhabaleswar K. Panda,et al.  Hybrid Algorithms for Complete Exchange in 2D Meshes , 2001, IEEE Trans. Parallel Distributed Syst..

[23]  Laxmikant V. Kalé,et al.  A framework for collective personalized communication , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[24]  Chi-Chung Lam,et al.  Optimal Algorithms for All-to-All Personalized Communication on Rings and Two Dimensional Tori , 1997, J. Parallel Distributed Comput..

[25]  E. Gabrielyan,et al.  Network topology aware scheduling of collective communications , 2003, 10th International Conference on Telecommunications, 2003. ICT 2003..

[26]  S. Lennart Johnsson,et al.  Optimum Broadcasting and Personalized Communication in Hypercubes , 1989, IEEE Trans. Computers.