Simple virtual channel allocation for high throughput and high frequency on-chip routers

Technology scaling has led to the integration of many cores into a single chip. As a result, on-chip interconnection networks start to play a more and more important role in determining the performance and power of the entire chip. Packet-switched network-on-chip (NoC) has provided a scalable solution to the communications for tiled multi-core processors. However the virtual-channel (VC) buffers in the NoC consume significant dynamic and leakage power of the system. To improve the energy efficiency of the router design, it is advantageous to use small buffer sizes while still maintaining throughput of the network. This paper proposes two new virtual channel allocation (VA) mechanisms, termed Fixed VC Assignment with Dynamic VC Allocation (FVADA) and Adjustable VC Assignment with Dynamic VC Allocation (AVADA). The idea is that VCs are assigned based on the designated output port of a packet to reduce the Head-of-Line (HoL) blocking. Also, the number of VCs allocated for each output port can be adjusted dynamically. Unlike previous buffer-pool based designs, we only use a small number of VCs to keep the arbitration latency low. Simulation results show that FVADA and AVADA can improve the network throughput by 41% on average, compared to a baseline design with the same buffer size. AVADA can still outperform the baseline even when our buffer size is halved. Moreover, we are able to achieve comparable or better throughput than a previous dynamic VC allocator while reducing its critical path delay by 60%. Our results prove that the proposed VA mechanisms are suitable for low-power, high-throughput, and high-frequency on-chip network designs.

[1]  Saurabh Dighe,et al.  An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[2]  Onur Mutlu,et al.  A case for bufferless routing in on-chip networks , 2009, ISCA '09.

[3]  Yuval Tamir,et al.  High-performance multiqueue buffers for VLSI communication switches , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[4]  Niraj K. Jha,et al.  Token flow control , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[5]  Jung Ho Ahn,et al.  A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies , 2008, 2008 International Symposium on Computer Architecture.

[6]  Kunle Olukotun,et al.  The case for a single-chip multiprocessor , 1996, ASPLOS VII.

[7]  Dean M. Tullsen,et al.  Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling , 2005, ISCA 2005.

[8]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[9]  Sriram R. Vangal,et al.  A 5-GHz Mesh Interconnect for a Teraflops Processor , 2007, IEEE Micro.

[10]  A. Kumary,et al.  A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS , 2007 .

[11]  A. Alvandpour,et al.  A 5.1GHz 0.34mm2 Router for Network-on-Chip Applications , 2007, 2007 IEEE Symposium on VLSI Circuits.

[12]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[13]  Kunle Olukotun,et al.  A Single-Chip Multiprocessor , 1997, Computer.

[14]  Timothy Mark Pinkston,et al.  Evaluation of queue designs for true fully adaptive routers , 2004, J. Parallel Distributed Comput..

[15]  José Duato Deadlock-free adaptive routing algorithms for multicomputers: evaluation of a new algorithm , 1991, Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing.

[16]  William J. Dally,et al.  A Delay Model for Router Microarchitectures , 2001, IEEE Micro.

[17]  Laxmi N. Bhuyan,et al.  Circular buffered switch design with wormhole routing and virtual channels , 1998, Proceedings International Conference on Computer Design. VLSI in Computers and Processors (Cat. No.98CB36273).

[18]  Shubhendu S. Mukherjee,et al.  The Alpha 21364 Network Architecture , 2002, IEEE Micro.

[19]  William J. Dally,et al.  Design tradeoffs for tiled CMP on-chip networks , 2006, ICS '06.

[20]  Chita R. Das,et al.  ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip Routers , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[21]  Ahmed Louri,et al.  iDEAL: Inter-router Dual-Function Energy and Area-Efficient Links for Network-on-Chip (NoC) Architectures , 2008, 2008 International Symposium on Computer Architecture.

[22]  William J. Dally,et al.  A delay model and speculative architecture for pipelined routers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[23]  William J. Dally,et al.  Virtual-channel flow control , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[24]  Chita R. Das,et al.  A low latency router supporting adaptivity for on-chip interconnects , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[25]  George Michelogiannakis,et al.  Elastic-buffer flow control for on-chip networks , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[26]  Chita R. Das,et al.  A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[27]  Sharad Malik,et al.  Power-driven design of router microarchitectures in on-chip networks , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[28]  Simon W. Moore,et al.  Low-latency virtual-channel routers for on-chip networks , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[29]  Stephen W. Keckler,et al.  Regional congestion awareness for load balance in networks-on-chip , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[30]  Stamatis Vassiliadis,et al.  Design and evaluation of a DAMQ multiprocessor network with self-compacting buffers , 1994, Proceedings of Supercomputing '94.