Improved MPI All-to-all Communication on a Giganet SMP Cluster

We present the implementation of an improved, almost optimal algorithm for regular, personalized all-to-all communication for hierarchical multiprocessors, like clusters of SMP nodes. In MPI this communication primitive is realized in the MPI_Alltoall collective. The algorithm is a natural generalization of a well-known algorithm for nonhierarchical systems based on factorization. A specific contribution of the paper is a completely contention-free scheme not using token-passing for exchange of messages between SMP nodes.We describe a dedicated implementation for a small Giganet SMP cluster with 6 SMP nodes of 4 processors each. We present simple experiments to validate the assumptions underlying the design of the algorithm. The results were used to guide the detailed implementation of a crucial part of the algorithm. Finally, we compare the improved MPI_Alltoall collective to a trivial (but widely used) implementation, and show improvements in average completion time of sometimes more than 10%. While this may not seem much, we have reasons to believe that the improvements will be more substantial for larger systems.

[1]  William Gropp,et al.  Mpi - The Complete Reference: Volume 2, the Mpi Extensions , 1998 .

[2]  Jesper Larsson Träff,et al.  The Hierarchical Factor Algorithm for All-to-All Communication (Research Note) , 2002, Euro-Par.

[3]  Henri E. Bal,et al.  Bandwidth-efficient collective communication for clustered wide area systems , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[4]  Roberto Solis-Oba,et al.  How Helpers Hasten h-Relations , 2000, J. Algorithms.

[5]  Jack Dongarra,et al.  Recent Advances in Parallel Virtual Machine and Message Passing Interface, 15th European PVM/MPI Users' Group Meeting, Dublin, Ireland, September 7-10, 2008. Proceedings , 2008, PVM/MPI.

[6]  Henri E. Bal,et al.  MagPIe: MPI's collective communication operations for clustered wide area systems , 1999, PPoPP '99.

[7]  Maciej Golebiewski,et al.  MPI-2 One-Sided Communications on a Giganet SMP Cluster , 2001, PVM/MPI.

[8]  Lars Paul Huse MPI Optimization for SMP Based Clusters Interconnected with SCI , 2000, PVM/MPI.

[9]  Susanne E. Hambrusch,et al.  Communication Operations on Coarse-Grained Mesh Architectures , 1995, Parallel Comput..

[10]  Bronis R. de Supinski,et al.  Exploiting hierarchy in parallel computer networks to optimize collective operation performance , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[11]  Henri E. Bal,et al.  MPI's Reduction Operations in Clustered Wide Area Systems. , 1999 .

[12]  Burkhard Monien,et al.  Euro-Par 2002 Parallel Processing , 2002, Lecture Notes in Computer Science.

[13]  Jack Dongarra,et al.  MPI - The Complete Reference: Volume 1, The MPI Core , 1998 .

[14]  Daniel Massey,et al.  The Dance Party Problem and its Application to Collective Communication in Computer Networks , 1997, Parallel Comput..

[15]  Frank Harary,et al.  Graph Theory , 2016 .