Measurement and Prediction of Communication Delays in Myrinet Networks

This paper describes a series of experiments carried out to determine if it is possible to accurately predict the delays of inter-node communication in a PC cluster system interconnected with a Myrinet switch network. Prediction accuracy is affected not only by the software and hardware overhead involved in network communication, but also interference from concurrent message streams. Based on extensive measurements using a 14-node Myrinet cluster system, it is determined that (1) the simple linear model typically used to model communication delay in networks is insufficient and (2) communication delay behavior with n message streams sharing a common link is more complicated than a simple divide-by-n solution. A piecewise-linear model, based on parameters obtained through experiments, is proposed as a more accurate communication delay prediction method when there is no sharing of communication links. However, if two or more message streams share a common link, then the communication delay is more accurately predicted as being one of a set of discrete values.