Performance benefits of virtual channels and adaptive routing: an application-driven study

Recent research on multiprocessor interconnection networks has primarily focussed on wormhole switching, virtual channel flow control and routing algorithms. These architectural features are aimed at enhancing the network performance by reducing the network latency, which in turn should improve the overall system performance. Many research results support this design philosophy by claiming significant reduction in average message latency. However, these conclusions are drawn using synthetic workloads that may not necessarily capture the behavior of real applications. In this paper, we have used parallel applications for a closer examination of the network behavior. In particular, the performance benefit from enhancing a 2-D mesh with virtual channels (VCs) and a routing algorithm (oblivious or fully adaptive) is examined with five shared memory applications using an execution-driven simulator, SPASM. In order to analyze the performance implications in greater detail, we also consider other parameters that have a direct bearing on network traffic. These are the number of processors used to solve a problem, problem size and memory consistency model. Simulation results show that VCs can reduce the network latency to varying degrees depending on the application. Similar gain is possible with a fully adaptive routing algorithm compared to the oblivious routing. However, with respect to the overall execution time, the performance benefit using these enhancements is negligible. Moreover, this benefit is negated when we consider the cost of implementing the VCs. These results suggest that the performance rewards may not justify the cost of these enhancements. Rather, we need to emphasize on improving the raw network bandwidth by simpler and improved router designs.

[1]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[2]  Luis Gravano,et al.  Adaptive Deadlock- and Livelock-Free Routing with All Minimal Paths in Torus Networks , 1994, IEEE Trans. Parallel Distributed Syst..

[3]  Chita R. Das,et al.  Efficient fully adaptive wormhole routing in n-dimensional meshes , 1994, 14th International Conference on Distributed Computing Systems.

[4]  Anand Sivasubramaniam,et al.  An approach to scalability study of shared memory parallel systems , 1994, SIGMETRICS.

[5]  Chita R. Das,et al.  Towards a communication characterization methodology for parallel applications , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[6]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[7]  Steven L. Scott,et al.  The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus , 1996 .

[8]  William J. Dally Virtual-channel flow control , 1990, ISCA '90.

[9]  Eric A. Brewer,et al.  PROTEUS: a high-performance parallel-architecture simulator , 1992, SIGMETRICS '92/PERFORMANCE '92.

[10]  Andrew A. Chien,et al.  The Cost of Adaptivity and Virtual Lanes in aWormhole Router , 1995 .

[11]  John L. Hennessy,et al.  Multiprocessor Simulation and Tracing Using Tango , 1991, ICPP.

[12]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[13]  Suresh Chalasani,et al.  A comparison of adaptive wormhole routing algorithms , 1993, ISCA '93.

[14]  Richard J. Anderson,et al.  On the parallel implementation of Goldberg's maximum flow algorithm , 1992, SPAA '92.

[15]  P. Messina,et al.  Architectural requirements of parallel scientific applications with explicit communication , 1993, ISCA '93.

[16]  David A. Wood,et al.  Accuracy vs. performance in parallel simulation of interconnection networks , 1995, Proceedings of 9th International Parallel Processing Symposium.

[17]  Laxmi N. Bhuyan,et al.  Evaluating virtual channels for cache-coherent shared-memory multiprocessors , 1996, ICS '96.

[18]  Lionel M. Ni,et al.  The Turn Model for Adaptive Routing , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[19]  Melanie L. Fulgham Performance of Chaos and Oblivious Routers Under Non-uniform Traffic , 1993 .

[20]  Lionel M. Ni,et al.  The turn model for adaptive routing , 1992, ISCA '92.

[21]  José Duato,et al.  A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks , 1993, IEEE Trans. Parallel Distributed Syst..