Extending the Effective Throughput of NoCs With Distributed Shared-Buffer Routers

Router microarchitecture plays a central role in the performance of networks-on-chip (NoCs). Buffers are needed in routers to house incoming flits that cannot be immediately forwarded due to contention. This buffering can be done at the inputs or the outputs of a router, corresponding to an input-buffered router (IBR) or an output-buffered router (OBR). OBRs are attractive because they can sustain higher throughputs and have lower queuing delays under high loads than IBRs. However, a direct implementation of an OBR requires a router speedup equal to the number of ports, making such a design prohibitive under aggressive clocking needs and limited power budgets of most NoC applications. In this paper, a new router design based on a distributed shared-buffer (DSB) architecture is proposed that aims to practically emulate an OBR. The proposed architecture introduces innovations to address the unique constraints of NoCs, including efficient pipelining and novel flow control. Practical DSB configurations are also presented with reduced power overheads while exhibiting negligible performance degradation. Compared to a state-of-the-art pipelined IBR, the proposed DSB router achieves up to 19% higher throughput on synthetic traffic and reduces packet latency on average by 61% when running SPLASH-2 benchmarks with high contention. On average, the saturation throughput of DSB routers is within 7% of the theoretically ideal saturation throughput under the synthetic workloads evaluated.

[1]  P LawtonKevin Bochs: A Portable PC Emulator for Unix/X , 1996 .

[2]  Radu Marculescu,et al.  DyAD - smart routing for networks-on-chip , 2004, Proceedings. 41st Design Automation Conference, 2004..

[3]  William J. Dally,et al.  Worst-case Traffic for Oblivious Routing Functions , 2002, IEEE Computer Architecture Letters.

[4]  DaeHo Seo,et al.  Near-Optimal Worst-Case Throughput Routing for Two-Dimensional Mesh Networks , 2005, ISCA 2005.

[5]  Rui Zhang,et al.  Routers with a single stage of buffering , 2002, SIGCOMM '02.

[6]  L. Benini,et al.  Xpipes: a network-on-chip architecture for gigascale systems-on-chip , 2004, IEEE Circuits and Systems Magazine.

[7]  Nick McKeown,et al.  Matching output queueing with a combined input output queued switch , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[8]  William J. Dally,et al.  Flit-reservation flow control , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[9]  Kevin P. Lawton Bochs: A Portable PC Emulator for Unix/X , 1996 .

[10]  Adnan Aziz,et al.  Randomized parallel schedulers for switch-memory-switch routers: analysis and numerical studies , 2004, IEEE INFOCOM 2004.

[11]  Sudhakar Yalamanchili,et al.  Interconnection Networks: An Engineering Approach , 2002 .

[12]  Chita R. Das,et al.  A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[13]  Niraj K. Jha,et al.  A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS , 2007, ICCD.

[14]  C.B. Stunkel,et al.  A New Switch Chip for IBM RS/6000 SP Systems , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[15]  Andrew B. Kahng,et al.  ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[16]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[17]  Simon W. Moore,et al.  Low-latency virtual-channel routers for on-chip networks , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[18]  Sharad Malik,et al.  Power-driven Design of Router Microarchitectures in On-chip Networks , 2003, MICRO.

[19]  Doug Burger,et al.  Implementation and Evaluation of On-Chip Network Architectures , 2006, 2006 International Conference on Computer Design.

[20]  Sharad Malik,et al.  Orion: a power-performance simulator for interconnection networks , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[21]  A. Kumary,et al.  A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS , 2007 .

[22]  Sharad Malik,et al.  Orion: a power-performance simulator for interconnection networks , 2002, MICRO.

[23]  Shubhendu S. Mukherjee,et al.  The Alpha 21364 network architecture , 2001, HOT 9 Interconnects. Symposium on High Performance Interconnects.

[24]  Cheng-Shang Chang,et al.  Load balanced Birkhoff-von Neumann switches, part I: one-stage buffering , 2002, Computer Communications.

[25]  Chita R. Das,et al.  ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip Routers , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[26]  William J. Dally,et al.  A delay model and speculative architecture for pipelined routers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[27]  Bill Lin,et al.  The Concurrent Matching Switch Architecture , 2006, IEEE/ACM Transactions on Networking.

[28]  Cheng-Shang Chang,et al.  Load balanced Birkhoff-von Neumann switches, part II: multi-stage buffering , 2002, Comput. Commun..

[29]  D. R. Fulkerson,et al.  Maximal Flow Through a Network , 1956 .

[30]  Nick McKeown,et al.  Scaling internet routers using optics , 2003, SIGCOMM '03.

[31]  Niraj K. Jha,et al.  Express virtual channels: towards the ideal interconnection fabric , 2007, ISCA '07.

[32]  Li-Shiuan Peh,et al.  SWIFT: A SWing-reduced interconnect for a Token-based Network-on-Chip in 90nm CMOS , 2010, 2010 IEEE International Conference on Computer Design.

[33]  William J. Dally,et al.  Virtual-channel flow control , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.