Communication studies of single-threaded and multithreaded distributed-memory machines

This report explicates the communication overlapping capabilities of three distributed-memory machines, SGI/Cray T3E, IBM SP-2 with wide nodes, and the ETL EM-X. Bitonic sorting and Fast Fourier Transform are selected for experiments. Various message sizes are used to determine when, where, how much and why the overlapping takes place. Experimental results with up to 64 processors indicated that the communication performance of EM-X is insensitive to various message sizes while SP-2 is the most sensitive. T3E stayed in between. The EM-X gave the highest communication overlapping capability while T3E did the lowest. The experimental results are compared with the analytical results based on LogP and LogGP communication models.