A Comparison of MPICH Allgather Algorithms on Switched Networks

This study evaluates the performance of MPI_Allgather() in MPICH 1.2.5 on a Linux cluster. This implementation of MPICH improves on the performance of allgather compared to previous versions by using a recursive doubling algorithm. We have developed a dissemination allgather based on the dissemination barrier algorithm. This algorithm takes log2 p stages for any values of p. We experimentally evaluate MPICH allgather and our implementations on a Linux cluster of dual-processor nodes using both TCP over FastEthernet and GM over Myrinet. We show that on Myrinet, variations of the dissemination algorithm perform best for both large and small messages. However, when using TCP, the dissemination allgather algorithm performs poorly because data is not exchanged in a pair-wise fashion. Therefore, we recommend the dissemination allgather for low-latency switched networks.