Fast Network-on-Chip Design

In previous Chapter, we showed how resonant clocking can be used as a high-speed, low power, stable, on-chip clock generation and distribution schemes. In this chapter, we use such a clock to design a high speed source-synchronous ring-based NoC architecture. In Sect. 3.1, we introduce our NoC design, which comprises of extremely fast, intersecting source-synchronous data rings. These source-synchronous data rings traverse the CMP in both the horizontal and vertical directions providing complete connectivity to all the PEs in a CMP. In our approach, the interconnection network operates on a different clock domain which runs significantly faster than the PE clocks. This helps us achieve inter-processor communication with minimal latency. We perform architectural simulations of the ring-based NoC in Sect. 3.2. We propose a deadlock-free routing protocol of the source-synchronous ring-based NoC by using link ordering and virtual channel based buffered flow control. Architectural results obtained on synthetic and real traffic demonstrate that the source-synchronous ring-based NoC has significantly lower latency and higher maximum sustained injection rate compared to a state of the art mesh-based NoC. Next, in Sect. 3.3, we propose a modified source-synchronous design in which the PEs extract a low jitter clock directly from the high speed ring clock by division, and hence are synchronous with the NoC. This is feasible due to the extremely good jitter characteristics of the SWO based clock generation and distribution scheme of Sect. 2.2. Using the above modified design, we propose a class of source-synchronous NoCs organized in an H-tree topology which consume lower logic and wiring area compared to a state of the art mesh. Architectural simulations on synthetic and real traffic show that our H-tree based NoC designs can provide significantly lower latency and are able to sustain a higher injection rate compared to a state of the art mesh. Using the modified source-synchronous design proposed in Sect. 3.3, we also evaluate two more floorplan-friendly NoC topologies in Sect. 3.4. These two floorplan-friendly NoC topologies consume significantly lower logic and wiring area compared to a state of the art mesh. Architectural simulations on synthetic and real traffic show that they can provide significantly lower latency while achieving same or better maximum sustained injection rate compared to a state of the art mesh.

[1]  William J. Dally,et al.  Route packets, not wires: on-chip inteconnection networks , 2001, DAC '01.

[2]  George Michelogiannakis,et al.  An analysis of on-chip interconnection networks for large-scale chip multiprocessors , 2010, TACO.

[3]  William J. Dally,et al.  Digital systems engineering , 1998 .

[4]  Daniele Ludovici,et al.  Mesochronous NoC technology for power-efficient GALS MPSoCs , 2011, INA-OCMC '11.

[5]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[6]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[7]  Karthik Ramani,et al.  Microarchitectural wire management for performance and power in partitioned architectures , 2005, 11th International Symposium on High-Performance Computer Architecture.

[8]  S. Kumar,et al.  Ring road NoC architecture , 2004, Proceedings Norchip Conference, 2004..

[9]  Murat Yuksel,et al.  Deadlock-free routing based on ordered links , 2002, 27th Annual IEEE Conference on Local Computer Networks, 2002. Proceedings. LCN 2002..

[10]  Radu Marculescu,et al.  System-level point-to-point communication synthesis using floorplanning information [SoC] , 2002, Proceedings of ASP-DAC/VLSI Design 2002. 7th Asia and South Pacific Design Automation Conference and 15h International Conference on VLSI Design.

[11]  Sharad Malik,et al.  Power-driven design of router microarchitectures in on-chip networks , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[12]  Jens Sparsø,et al.  The MANGO clockless network-on-chip: Concepts and implementation , 2006 .

[13]  George Michelogiannakis,et al.  Evaluating Bufferless Flow Control for On-chip Networks , 2010, 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip.

[14]  Uzi Vishkin,et al.  A Low-Overhead Asynchronous Interconnection Network for GALS Chip Multiprocessors , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[15]  Sujit Dey,et al.  An Interconnect Architecture for Networking Systems on Chips , 2002, IEEE Micro.

[16]  William J. Dally,et al.  Design tradeoffs for tiled CMP on-chip networks , 2006, ICS '06.

[17]  Miltos D. Grammatikakis,et al.  NoC Topologies Exploration based on Mapping and Simulation Models , 2007 .

[18]  Mario R. Casu,et al.  Implementation analysis of NoC: a MPSoC trace-driven approach , 2006, GLSVLSI '06.

[19]  William J. Dally,et al.  Flattened Butterfly Topology for On-Chip Networks , 2007, IEEE Comput. Archit. Lett..

[20]  Belliappa Kuttanna,et al.  A Sub-2 W Low Power IA Processor for Mobile Internet Devices in 45 nm High-k Metal Gate CMOS , 2009, IEEE Journal of Solid-State Circuits.

[21]  Steven M. Nowick,et al.  A low-latency FIFO for mixed-clock systems , 2000, Proceedings IEEE Computer Society Workshop on VLSI 2000. System Design for a System-on-Chip Era.

[22]  Fabien Clermidy,et al.  A fully-asynchronous low-power framework for GALS NoC integration , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[23]  Doug Burger,et al.  Implementation and Evaluation of On-Chip Network Architectures , 2006, 2006 International Conference on Computer Design.

[24]  Bevan M. Baas,et al.  A Reconfigurable Source-Synchronous On-Chip Network for GALS Many-Core Platforms , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[25]  Sudhakar Yalamanchili,et al.  Interconnection Networks: An Engineering Approach , 2002 .

[26]  Charles E. Leiserson,et al.  Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.

[27]  Todd M. Austin,et al.  Polymorphic On-Chip Networks , 2008, 2008 International Symposium on Computer Architecture.

[28]  William J. Dally,et al.  The torus routing chip , 2005, Distributed Computing.

[29]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.