A high performance router with dynamic buffer allocation for on-chip interconnect networks

With the number of processor cores increasing in chip multi-processors (CMPs) and global wire delays increasing, networks on chip have been gaining wide acceptance for on-chip inter-core communication. This paper introduces a low latency Dynamic Virtual Output Queues Router (DVOQR), which can reduce the router latency to two cycles by leveraging look-ahead routing computation and virtual output address queues scheme. Simulation results show that network throughput on a 4×4 mesh increases by up to 46.9% and 28.6%, compared to wormhole router and virtual channel router, and that DVOQR outperforms doubled buffer virtual channel router by 1.9% under same input speedup. Network zero-load-latency also decreases by 25.6% and 41% respectively under random traffic. The results with place and route used by Cadence Encounter in TSMC 65nm technology display that the frequency of DVOQR can reach 1.4 GHz, the cell area of the router is only 0.424mm2 and the power consumption is 274 mw under the 50% injection rate.

[1]  Chita R. Das,et al.  ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip Routers , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[2]  Ahmed Louri,et al.  iDEAL: Inter-router Dual-Function Energy and Area-Efficient Links for Network-on-Chip (NoC) Architectures , 2008, 2008 International Symposium on Computer Architecture.

[3]  William J. Dally,et al.  A delay model and speculative architecture for pipelined routers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[4]  Radu Marculescu,et al.  Energy- and performance-aware mapping for regular NoC architectures , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[5]  Yuval Tamir,et al.  High-performance multiqueue buffers for VLSI communication switches , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[6]  Samuel P. Morgan,et al.  Input Versus Output Queueing on a Space-Division Packet Switch , 1987, IEEE Trans. Commun..

[7]  Karthikeyan Sankaralingam,et al.  On-Chip Interconnection Networks of the TRIPS Chip , 2007, IEEE Micro.

[8]  Simon W. Moore,et al.  The design and implementation of a low-latency on-chip network , 2006, Asia and South Pacific Conference on Design Automation, 2006..

[9]  W. Dally,et al.  Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[10]  Saurabh Dighe,et al.  An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[11]  A. Kumary,et al.  A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS , 2007 .

[12]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[13]  Thomas G. Robertazzi,et al.  Input Versus Output Queueing on a SpaceDivision Packet Switch , 1993 .

[14]  Mikko H. Lipasti,et al.  Circuit-Switched Coherence , 2007, IEEE Comput. Archit. Lett..

[15]  Henry Hoffmann,et al.  On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.