SDPR: Improving Latency and Bandwidth in On-Chip Interconnect Through Simultaneous Dual-Path Routing

Networks-on-chips (NoCs) are gaining in popularity as replacement for shared medium interconnects in chip-multiprocessors (CMPs) and multiprocessor systems-on-chips, and their performance becoming essential to system performance. There have been emerging studies to achieve better power/energy efficiency without performance degradation on NoCs. However, there are still non-negligible latency issues caused by the mechanism of power efficient approaches. To alleviate the latency problem and to transfer data efficiently with the high utilization of interconnect resources, we propose an on-chip network architecture that improves latency and bandwidth. Increasing the data/link widths across the network may considerably resolve this problem but is a costly proposition both in terms of device area and of power. Alternatively, we propose a dual-path router architecture that efficiently exploits path diversity to attain low latency without significant hardware overhead. By: 1) doubling the number of injection and ejection ports; 2) splitting packets into two halves; 3) recomposing routing policy to support path diversity; and 4) provisioning the network hardware design, we can considerably enhance network resource utilization to achieve much higher performance in latency. The proposed simultaneous dual-path routing (SDPR) scheme outperformed the conventional dimension order routing (DOR) technique across synthetic workloads by 31%–40% in average latency and up to a 100% improvement in throughput performance running on a 49-core CMP. Our synthesizable model for the SDPR router and network provides accurate power and area reports. According to the synthesis reports, SDPR incurs insignificant overhead compared to the baseline XY DOR router.

[1]  Akif Ali,et al.  Near-optimal worst-case throughput routing for two-dimensional mesh networks , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[2]  Jörg Henkel,et al.  A case study in networks-on-chip design for embedded video , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[3]  Cheng Jin,et al.  MATE: MPLS adaptive traffic engineering , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[4]  Sriram R. Vangal,et al.  A 5-GHz Mesh Interconnect for a Teraflops Processor , 2007, IEEE Micro.

[5]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[6]  Henry Hoffmann,et al.  On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.

[7]  Oliver Chiu-sing Choy,et al.  A low-latency NoC router with lookahead bypass , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[8]  Min Huang,et al.  An energy efficient 32nm 20 MB L3 cache for Intel® Xeon® processor E5 family , 2012, Proceedings of the IEEE 2012 Custom Integrated Circuits Conference.

[9]  W. Dally,et al.  Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[10]  José-Ángel Gregorio,et al.  Effects of injection pressure on network throughput , 2006, 14th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP'06).

[11]  William J. Dally,et al.  A delay model and speculative architecture for pipelined routers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[12]  Ge-Ming Chiu,et al.  The Odd-Even Turn Model for Adaptive Routing , 2000, IEEE Trans. Parallel Distributed Syst..

[13]  Narayanan Vijaykrishnan,et al.  Optimizing the NoC Slack Through Voltage and Frequency Scaling in Hard Real-Time Embedded Systems , 2014, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[14]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[15]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[16]  S. Lennart Johnsson,et al.  ROMM Routing: A Class of Efficient Minimal Routing Algorithms , 1994, PCRCW.

[17]  E. Filippi,et al.  An outlook on the evolution of mobile terminals: from monolithic to modular multiradio, multiapplication platforms , 2006, IEEE Circuits and Systems Magazine.

[18]  Jaehyuk Huh,et al.  Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture , 2003, IEEE Micro.

[19]  Hannu Tenhunen,et al.  A study of 3D Network-on-Chip design for data parallel H.264 coding , 2009, 2009 NORCHIP.

[20]  Leslie G. Valiant,et al.  Universal schemes for parallel communication , 1981, STOC '81.

[21]  Srinivas Devadas,et al.  Path-based, Randomized, Oblivious, Minimal routing , 2009, 2009 2nd International Workshop on Network on Chip Architectures.

[22]  George Michelogiannakis,et al.  Evaluating Bufferless Flow Control for On-chip Networks , 2010, 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip.

[23]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[24]  Zheng Wang,et al.  Explicit routing algorithms for Internet traffic engineering , 1999, Proceedings Eight International Conference on Computer Communications and Networks (Cat. No.99EX370).

[25]  Sudhakar Yalamanchili,et al.  Interconnection Networks: An Engineering Approach , 2002 .

[26]  Kai Li,et al.  PARSEC vs. SPLASH-2: A quantitative comparison of two multithreaded benchmark suites on Chip-Multiprocessors , 2008, 2008 IEEE International Symposium on Workload Characterization.

[27]  Yücel Altunbasak,et al.  Performance comparison of the emerging H.264 video coding standard with the existing standards , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[28]  Thomas Wiegand,et al.  Draft ITU-T recommendation and final draft international standard of joint video specification , 2003 .

[29]  Sriram R. Vangal,et al.  A 2 Tb/s 6$\,\times\,$ 4 Mesh Network for a Single-Chip Cloud Computer With DVFS in 45 nm CMOS , 2011, IEEE Journal of Solid-State Circuits.

[30]  Ariel Orda,et al.  Multipath routing algorithms for congestion minimization , 2007, IEEE/ACM Trans. Netw..

[31]  Yuan Xie,et al.  DimNoC: A dim silicon approach towards power-efficient on-chip network , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[32]  Onur Mutlu,et al.  A case for bufferless routing in on-chip networks , 2009, ISCA '09.

[33]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[34]  Sriram R. Vangal,et al.  A 2 Tb/s 6 × 4 Mesh Network for a Single-Chip Cloud Computer With DVFS in 45 nm CMOS , 2011, VLSIC 2011.

[35]  Xiaofang Wang,et al.  A low-area and low-latency network on chip , 2010, CCECE 2010.

[36]  Yoon Seok Yang,et al.  Exploiting path diversity for low-latency and high-bandwidth with the dual-path NoC router , 2012, 2012 IEEE International Symposium on Circuits and Systems.

[37]  Hoi-Jun Yoo,et al.  SILENT: serialized low energy transmission coding for on-chip interconnection networks , 2004, IEEE/ACM International Conference on Computer Aided Design, 2004. ICCAD-2004..

[38]  Fernando Gehm Moraes,et al.  Congestion-Aware Task Mapping in NoC-based MPSoCs with Dynamic Workload , 2007, IEEE Computer Society Annual Symposium on VLSI (ISVLSI '07).

[39]  Radu Marculescu,et al.  Design space exploration and prototyping for on-chip multimedia applications , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[40]  Luca Benini,et al.  A multi-path routing strategy with guaranteed in-order packet delivery and fault-tolerance for networks on chip , 2006, 2006 43rd ACM/IEEE Design Automation Conference.