ShortPath: A Network-on-Chip Router with Fine-Grained Pipeline Bypassing

Scalable Network-on-Chip (NoC) architectures should achieve high-throughput and low-latency operation without exceeding the stringent area/energy constraints of modern Systems-on-Chip (SoC), even when operating under a high clock frequency. Such requirements directly impact the NoC routers and interfaces comprising the NoC architecture. This paper focuses on the micro-architecture of NoC routers and presents ShortPath, a pipelined router architecture that can achieve high-speed implementations by parallelizing as much as possible - and without resorting to speculation - the allocation steps involved in the operation of a VC-based router. Most importantly, ShortPath is augmented with a fine-grained pipeline bypassing mechanism, which skips all stages without contention and “fast-forwards” the flits to the first point of contention. Pipeline bypassing in ShortPath is always productive, and even if a flit loses in arbitration, it does not repeat any of the stages already bypassed. Extensive network simulations and hardware analysis - using standard-cell-based synthesis and placed-and-routed layout - corroborate the efficiency of ShortPath, in terms of both network performance and hardware complexity, as compared to the most relevant current state-of-the-art architecture.

[1]  Nan Jiang,et al.  Packet Chaining: Efficient Single-Cycle Allocation for On-Chip Networks , 2011, IEEE Computer Architecture Letters.

[2]  Jongman Kim,et al.  TornadoNoC: A lightweight and scalable on-chip network architecture for the many-core era , 2013, TACO.

[3]  Emmanouil Kalligeros,et al.  Merged Switch Allocation and Traversal in Network-on-Chip Switches , 2013, IEEE Transactions on Computers.

[4]  Niraj K. Jha,et al.  Token flow control , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[5]  S. Borkar,et al.  Total power optimization by simultaneous dual-Vt allocation and device sizing in high performance microprocessors , 2002, Proceedings 2002 Design Automation Conference (IEEE Cat. No.02CH37324).

[6]  Shasi Kumar,et al.  A 2Tb/s 6×4 mesh network with DVFS and 2.3Tb/s/W router in 45nm CMOS , 2010, 2010 Symposium on VLSI Circuits.

[7]  Simon W. Moore,et al.  Low-latency virtual-channel routers for on-chip networks , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[8]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[9]  Krste Asanovic,et al.  Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks , 2008, 2008 International Symposium on Computer Architecture.

[10]  Chita R. Das,et al.  On the Effects of Process Variation in Network-on-Chip Architectures , 2010, IEEE Transactions on Dependable and Secure Computing.

[11]  William J. Dally,et al.  A delay model and speculative architecture for pipelined routers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[12]  Giorgos Dimitrakopoulos,et al.  Microarchitecture of Network-on-Chip Routers: A Designer's Perspective , 2014 .

[13]  Sakir Sezer,et al.  Design of interlock-free combined allocators for Networks-on-Chip , 2012, 2012 IEEE International SOC Conference.

[14]  Chung-Ta King,et al.  TS-Router: On maximizing the Quality-of-Allocation in the On-Chip Network , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[15]  José Duato,et al.  A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks , 1993, IEEE Trans. Parallel Distributed Syst..

[16]  Donglai Dai,et al.  On-chip Interconnect Trade-offs for Tera-scale Many-core Processors , 2010 .

[17]  Li-Shiuan Peh,et al.  SWIFT: A SWing-reduced interconnect for a Token-based Network-on-Chip in 90nm CMOS , 2010, 2010 IEEE International Conference on Computer Design.

[18]  A. Kumary,et al.  A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS , 2007 .

[19]  Avinoam Kolodny,et al.  Designing single-cycle long links in hierarchical NoCs , 2014, Microprocess. Microsystems.

[20]  Niraj K. Jha,et al.  A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS , 2007, ICCD.

[21]  Emmanouil Kalligeros,et al.  ElastiNoC: A self-testable distributed VC-based Network-on-Chip architecture , 2014, 2014 Eighth IEEE/ACM International Symposium on Networks-on-Chip (NoCS).

[22]  Tsutomu Yoshinaga,et al.  Prediction Router: A Low-Latency On-Chip Router Architecture with Multiple Predictors , 2011, IEEE Transactions on Computers.

[23]  Natalie D. Enright Jerger,et al.  Whole packet forwarding: Efficient design of fully adaptive routing algorithms for networks-on-chip , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[24]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[25]  Kees G. W. Goossens,et al.  Avoiding Message-Dependent Deadlock in Network-Based Systems on Chip , 2007, VLSI Design.

[26]  Luca Benini,et al.  Networks on Chips : A New SoC Paradigm , 2022 .

[27]  Chrysostomos Nicopoulos,et al.  PhaseNoC: TDM scheduling at the virtual-channel level for efficient network traffic isolation , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[28]  Sriram R. Vangal,et al.  A 5-GHz Mesh Interconnect for a Teraflops Processor , 2007, IEEE Micro.

[29]  Anantha Chandrakasan,et al.  SCORPIO: A 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[30]  W. Dally,et al.  Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[31]  Andrew A. Chien,et al.  The future of microprocessors , 2011, Commun. ACM.

[32]  William J. Dally,et al.  Design tradeoffs for tiled CMP on-chip networks , 2006, ICS '06.

[33]  Mike Galles Spider: a high-speed network interconnect , 1997, IEEE Micro.

[34]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).