A traditional measure of the communication performance in a massively parallel processor (MPP) is the average message latency. In this paper we argue that the variance, or equivalently, the standard deviation of message latencies should be considered as an additional important measure in a multi-user MPP environment. Using a simulation based approach, we investigate various schemes to reduce the standard deviation without adversely affecting the average latency. Simulations on a 16 16 wormhole routed mesh architecture reveal that the standard deviation varies typically from 25% to 50% of the average depending on the message generation rate, the message length, the traffic pattern, and the routing algorithm. Since the differential blocking faced by messages is one of the major causes for the large standard deviation, we take two approaches to reducing it which do not degrade the average. The first controls the relative progress with which different messages proceed towards their destinations by prioritizing them dynamically. The second employs multiple virtual channels per physical channel. We compare the performance of several schemes for each of these two approaches, using deterministic routing as our base case. A distinguishing feature of our investigation is that the simulation model accounts for the degradation in network cycle time with increased router complexity required by the use of the priority schemes and by the virtual channels. In fact, we show how our experimental results could have been misleading had we followed the traditional “cycle counting” approach.
[1]
D. N. Jayasimha,et al.
Optimal fully adaptive wormhole routing for meshes
,
1993,
Supercomputing '93.
[2]
Smaragda Konstantinidou.
Priorities in Nonminimal, Adaptive Routing
,
1992,
ICPP.
[3]
José Duato.
Deadlock-free adaptive routing algorithms for multicomputers: evaluation of a new algorithm
,
1991,
Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing.
[4]
Jeffrey May,et al.
Comparison of Multiplexing Schemes for Wormhole-Routed Networks
,
1994
.
[5]
William J. Dally,et al.
Deadlock-Free Message Routing in Multiprocessor Interconnection Networks
,
1987,
IEEE Transactions on Computers.
[6]
Lionel M. Ni,et al.
Adaptive routing in mesh-connected networks
,
1992,
[1992] Proceedings of the 12th International Conference on Distributed Computing Systems.
[7]
Ronald Mraz.
Reducing the variance of point-to-point transfers for parallel real-time programs
,
1994,
IEEE Parallel & Distributed Technology: Systems & Applications.
[8]
William J. Dally,et al.
Virtual-channel flow control
,
1990,
[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[9]
José Duato.
Improving the efficiency of virtual channels with time-dependent selection functions
,
1994,
Future Gener. Comput. Syst..