TS-Router: On maximizing the Quality-of-Allocation in the On-Chip Network

Switch allocation is a critical pipeline stage in the router of an Network-on-Chip (NoC), in which flits in the input ports of the router are assigned to the output ports for forwarding. This allocation is in essence a matching between the input requests and output port resources. Efficient router designs strive to maximize the matching. Previous research considers the allocation decision at each cycle either independently or depending on prior allocations. In this paper, we demonstrate that the matching decisions made in a router along time actually form a time series, and the Quality-of-Allocation (QoA) can be maximized if the matching decision is made across the time series, from past history to future requests. Based on this observation, a novel router design, TS-Router, is proposed. TS-Router predicts future requests to arrive at a router and tries to maximize the matching across cycles. It can be extended easily from most state-of-the-art routers in a lightweight fashion. Our evaluation of TS-Router uses synthetic traffic as well as real benchmark programs in full-system simulator. The results show that TS-Router can have higher number of matchings and lower latency. In addition, a prototype of TS-Router is implemented in Verilog, so that power consumption and area overhead are also evaluated.

[1]  Simon W. Moore,et al.  Low-latency virtual-channel routers for on-chip networks , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[2]  W. Dally,et al.  Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[3]  Yuval Tamir,et al.  Symmetric Crossbar Arbiters for VLSI Communication Switches , 1993, IEEE Trans. Parallel Distributed Syst..

[4]  Tsutomu Yoshinaga,et al.  Prediction Router: A Low-Latency On-Chip Router Architecture with Multiple Predictors , 2011, IEEE Transactions on Computers.

[5]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[6]  Sharad Malik,et al.  Power-driven design of router microarchitectures in on-chip networks , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[7]  José Duato,et al.  High-radix crossbar switches enabled by Proximity Communication , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[8]  Sharad Malik,et al.  Orion: a power-performance simulator for interconnection networks , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[9]  Hideharu Amano,et al.  Prediction router: Yet another low latency on-chip router architecture , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[10]  Stephen W. Keckler,et al.  Netrace: dependency-driven trace-based network-on-chip simulation , 2010, NoCArc '10.

[11]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[12]  Z. Ding,et al.  A Near-optimal Real-time Hardware Scheduler for Large Cardinality Crossbar Switches , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[13]  William J. Dally,et al.  A delay model and speculative architecture for pipelined routers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[14]  Niraj K. Jha,et al.  Express virtual channels: towards the ideal interconnection fabric , 2007, ISCA '07.

[15]  Dean M. Tullsen,et al.  Interconnections in multi-core architectures: understanding mechanisms, overheads and scaling , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[16]  Dean M. Tullsen,et al.  Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling , 2005, ISCA 2005.

[17]  William J. Dally,et al.  Allocator implementations for network-on-chip routers , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[18]  Niraj K. Jha,et al.  GARNET: A detailed on-chip network model inside a full-system simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[19]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[20]  Nan Jiang,et al.  Packet Chaining: Efficient Single-Cycle Allocation for On-Chip Networks , 2011, IEEE Computer Architecture Letters.

[21]  George Michelogiannakis,et al.  An analysis of on-chip interconnection networks for large-scale chip multiprocessors , 2010, TACO.

[22]  Nick McKeown,et al.  The iSLIP scheduling algorithm for input-queued switches , 1999, TNET.

[23]  Cyriel Minkenberg,et al.  Speculative Flow Control for High-Radix Datacenter Interconnect Routers , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[24]  Tor M. Aamodt,et al.  Complexity effective memory access scheduling for many-core accelerator architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[25]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[26]  Eun Jung Kim,et al.  Pseudo-Circuit: Accelerating Communication for On-Chip Interconnection Networks , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[27]  D. R. Fulkerson,et al.  Maximal Flow Through a Network , 1956 .

[28]  William J. Dally Virtual-channel flow control , 1990, ISCA '90.

[29]  Mike Galles Spider: a high-speed network interconnect , 1997, IEEE Micro.

[30]  William J. Dally,et al.  Microarchitecture of a high radix router , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).