A Delay Model of Two-Cycle NoC Router in 2D-Mesh Network

With the increasing number of processor cores in chip multi-processors (CMPs), 2D Mesh has been gaining wide acceptance for inter-core on-chip communication. Program performance is more sensitive to the router latency than to the link bandwidth. This paper presents a low latency Dynamic Virtual Output Queues Router (DVOQR), which can reduce the router latency to two cycles by leveraging look-ahead routing computation and virtual output queues scheme. The router delay model of DVOQR is derived based on the logical effect theory, and validated against Synopsys PrimeTime in TSMC 65nm technology. Results show that the critical path delay of the flit switch stage is more sensitive to the depth of unified dynamic buffer than to the link bandwidth. The critical path delay of DVOQR increases FO4 when the depth of Unified Dynamic Buffer (UDB) doubled. The frequency of DVOQR can reach 2.5GHz, which is improved by 20% compared to the virtual channel router, while the router only takes up 0.404mm2.

[1]  Yuval Tamir,et al.  High-performance multiqueue buffers for VLSI communication switches , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[2]  Karthikeyan Sankaralingam,et al.  On-Chip Interconnection Networks of the TRIPS Chip , 2007, IEEE Micro.

[3]  Simon W. Moore,et al.  The design and implementation of a low-latency on-chip network , 2006, Asia and South Pacific Conference on Design Automation, 2006..

[4]  W. Dally,et al.  Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[5]  William J. Dally,et al.  A delay model and speculative architecture for pipelined routers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[6]  Samuel P. Morgan,et al.  Input Versus Output Queueing on a Space-Division Packet Switch , 1987, IEEE Trans. Commun..

[7]  Radu Marculescu,et al.  Energy- and performance-aware mapping for regular NoC architectures , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[8]  A. Kumary,et al.  A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS , 2007 .

[9]  Saurabh Dighe,et al.  An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[10]  Mikko H. Lipasti,et al.  Circuit-Switched Coherence , 2007, IEEE Comput. Archit. Lett..

[11]  Henry Hoffmann,et al.  On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.