A speculative arbiter design to enable high-frequency many-VC router in NoCs

High-performance network-on-chip routers usually prefer a large number of Virtual Channels (VC) for high throughput. However, the growth in VC count results in increased arbitration complexity and reduced router clock frequency. In this paper, we propose a novel high-frequency many-input arbiter design for many-VC routers. It is based on the speculation on short and thus fast arbitrations in case of high VC occupancy. We further enhance it to reduce arbitration latency and promote speculation opportunity. Simulation results show that using the proposed arbiter, a 16-VC router achieves almost the same performance as an ideal design, showing improvements of around 48% on zero-load latency and 100% on network throughput over a naive 16-VC design.

[1]  Sriram R. Vangal,et al.  A 2 Tb/s 6$\,\times\,$ 4 Mesh Network for a Single-Chip Cloud Computer With DVFS in 45 nm CMOS , 2011, IEEE Journal of Solid-State Circuits.

[2]  Jun Yang,et al.  A composite and scalable cache coherence protocol for large scale CMPs , 2011, ICS '11.

[3]  William J. Dally,et al.  Virtual-Channel Flow Controll , 2004 .

[4]  Hamid Sarbazi-Azad,et al.  The effect of virtual channel organization on the performance of interconnection networks , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[5]  A. Kumary,et al.  A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS , 2007 .

[6]  Jun Yang,et al.  Simple virtual channel allocation for high throughput and high frequency on-chip routers , 2010, HPCA.

[7]  William J. Dally,et al.  Design tradeoffs for tiled CMP on-chip networks , 2006, ICS '06.

[8]  Sriram R. Vangal,et al.  A 2 Tb/s 6 × 4 Mesh Network for a Single-Chip Cloud Computer With DVFS in 45 nm CMOS , 2011, VLSIC 2011.

[9]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[10]  William J. Dally,et al.  Virtual-channel flow control , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[11]  William J. Dally,et al.  Allocator implementations for network-on-chip routers , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[12]  Chita R. Das,et al.  On the Effects of Process Variation , 2010 .

[13]  Xiaolei Guo,et al.  A fast arbitration scheme for terabit packet switches , 1999, Seamless Interconnection for Universal Services. Global Telecommunications Conference. GLOBECOM'99. (Cat. No.99CH37042).

[14]  Steven M. Nowick,et al.  A low-latency adaptive asynchronous interconnection network using bi-modal router nodes , 2011, Proceedings of the Fifth ACM/IEEE International Symposium.

[15]  Simon W. Moore,et al.  Low-latency virtual-channel routers for on-chip networks , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[16]  Nick McKeown,et al.  Designing and implementing a fast crossbar scheduler , 1999, IEEE Micro.

[17]  Natalie D. Enright Jerger,et al.  Whole packet forwarding: Efficient design of fully adaptive routing algorithms for networks-on-chip , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[18]  Chita R. Das,et al.  ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip Routers , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[19]  William J. Dally,et al.  A delay model and speculative architecture for pipelined routers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.