Prediction Router: A Low-Latency On-Chip Router Architecture with Multiple Predictors

Multi and many-core applications are sensitive to interprocessor communication latencies, suggesting the need for low-latency on-chip networks. We propose a low-latency router architecture that predicts the output channel to be used by the next packet transfer and speculatively completes the switch arbitration to reduce communication latency. The packets coming into the prediction routers are transferred without waiting for the routing computation and switch arbitration if the prediction hits. Thus, the primary concern for reducing communication latency is the hit rates of the prediction algorithms, which vary based on network environments, such as the network topology, routing algorithm, and traffic pattern. Although typical low-latency routers that skip one or more pipeline stages use a bypass data path that is based on a static or single bypassing policy (e.g., accelerating the packets moving in the same dimension), our prediction router architecture predictively forwards packets based on the prediction algorithm selected from among several candidates in response to the network environment. We analyze the prediction hit rates of five prediction algorithms on meshes, tori, fat trees, and Spidergons. Then, we present four case studies, each of which assumes different many-core architectures. We implemented the prediction routers for each case study by using a 45 nm CMOS process, and evaluated them in terms of the prediction hit rate, zero-load latency, hardware amount, and energy consumption. A typical prediction router with two or three predictors shows that although the area and energy are increased by 4.8-12.0 percent and 5.3 percent, respectively, up to 89.8 percent of the prediction hit rate is achieved in real applications, which provides favorable trade-offs between modest hardware/energy overheads and significant latency saving.

[1]  W. Dally,et al.  Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[2]  Nicola Concer,et al.  Simulation and analysis of network on chip architectures: ring, spidergon and 2D mesh , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[3]  Philippe Jacquet,et al.  A universal predictor based on pattern matching , 2002, IEEE Trans. Inf. Theory.

[4]  Luca Benini,et al.  Networks on chips - technology and tools , 2006, The Morgan Kaufmann series in systems on silicon.

[5]  M. Koibuchi,et al.  Predictive Switching in 2-D Torus Routers , 2006, International Workshop on Innovative Architecture for Future Generation High Performance Processors and Systems (IWIA'06).

[6]  William J. Dally,et al.  Flit-reservation flow control , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[7]  Pedro López,et al.  A high performance router architecture for interconnection networks , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.

[8]  R. Beivide,et al.  Mad-postman : A Look-ahead Message Propagation Method For Static Bidimensional Meshes , 1994, Proceedings. Second Euromicro Workshop on Parallel and Distributed Processing.

[9]  Niraj K. Jha,et al.  Express virtual channels: towards the ideal interconnection fabric , 2007, ISCA '07.

[10]  Chita R. Das,et al.  A novel dimensionally-decomposed router for on-chip communication in 3D architectures , 2007, ISCA '07.

[11]  Amit Kumar,et al.  NoC with Near-Ideal Express Virtual Channels Using Global-Line Communication , 2008, 2008 16th IEEE Symposium on High Performance Interconnects.

[12]  Chita R. Das,et al.  Design of a Dynamic Priority-Based Fast Path Architecture for On-Chip Interconnects , 2007, 15th Annual IEEE Symposium on High-Performance Interconnects (HOTI 2007).

[13]  M. Coppola,et al.  Spidergon: a novel on-chip communication network , 2004, 2004 International Symposium on System-on-Chip, 2004. Proceedings..

[14]  Chita R. Das,et al.  MIRA: A Multi-layered On-Chip Interconnect Router Architecture , 2008, 2008 International Symposium on Computer Architecture.

[15]  Simon W. Moore,et al.  The design and implementation of a low-latency on-chip network , 2006, Asia and South Pacific Conference on Design Automation, 2006..

[16]  James E. Smith,et al.  The predictability of data values , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[17]  S. Borkar,et al.  An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS , 2008, IEEE Journal of Solid-State Circuits.

[18]  Wim Vanderbauwhede,et al.  Communication Modelling of the Spidergon NoC with Virtual Channels , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).

[19]  Niraj K. Jha,et al.  A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS , 2007, ICCD.

[20]  Hideharu Amano,et al.  Run-time power gating of on-chip routers using look-ahead routing , 2008, 2008 Asia and South Pacific Design Automation Conference.

[21]  Luca Benini,et al.  Networks on Chips : A New SoC Paradigm , 2022 .

[22]  Arnab Banerjee,et al.  A Power and Energy Exploration of Network-on-Chip Architectures , 2007, First International Symposium on Networks-on-Chip (NOCS'07).

[23]  A. Kumary,et al.  A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS , 2007 .

[24]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[25]  Chita R. Das,et al.  A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[26]  Hideharu Amano,et al.  A Lightweight Fault-Tolerant Mechanism for Network-on-Chip , 2008, Second ACM/IEEE International Symposium on Networks-on-Chip (nocs 2008).

[27]  Sharad Malik,et al.  A technology-aware and energy-oriented topology exploration for on-chip networks , 2005, Design, Automation and Test in Europe.

[28]  Chita R. Das,et al.  A low latency router supporting adaptivity for on-chip interconnects , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[29]  Henry Hoffmann,et al.  On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.

[30]  William J. Dally,et al.  A delay model and speculative architecture for pipelined routers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[31]  M. Koibuchi,et al.  Impact of Predictive Switching in 2-D Torus Networks , 2007, Innovative architecture for future generation high-performance processors and systems (iwia 2007).

[32]  Simon W. Moore,et al.  Low-latency virtual-channel routers for on-chip networks , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[33]  Mikko H. Lipasti,et al.  Circuit-Switched Coherence , 2007, IEEE Comput. Archit. Lett..

[34]  Hideharu Amano,et al.  Prediction router: Yet another low latency on-chip router architecture , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[35]  George Michelogiannakis,et al.  Approaching Ideal NoC Latency with Pre-Configured Routes , 2007, First International Symposium on Networks-on-Chip (NOCS'07).