Efficient memory access in 2D Mesh NoC architectures using high bandwidth routers

Networks on Chip (NoC) have emerged as a promising interconnection technology for scalable many-core architectures. Proposed NoC-architectures and topologies often assume uniform distribution of traffic, where all tiles produce and consume the same amount of data. However, even in homogeneous many-core architectures the Network on Chip is used to access peripheral buses and memory controllers for off-chip memory access. These components can consume and generate a significant part of the overall traffic, thus having higher bandwidth requirements than processing tiles. In this work we propose High Bandwidth Routers replacing conventional routers within the NoC at the positions, where components with high bandwidth requirements are attached. A High Bandwidth Router design is proposed and investigated with respect to performance and implementation costs. The position of memory nodes within a tiled architecture is analyzed. The results show a significant improvement of throughput and reduction of latency for memory communication, with moderate additional implementation costs.

[1]  John Kim,et al.  Low-cost router microarchitecture for on-chip networks , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[2]  Axel Jantsch,et al.  Guaranteed bandwidth using looped containers in temporally disjoint networks within the nostrum network on chip , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[3]  Saurabh Dighe,et al.  A 48-Core IA-32 Processor in 45 nm CMOS Using On-Die Message-Passing and DVFS for Performance and Power Scaling , 2011, IEEE Journal of Solid-State Circuits.

[4]  Kees G. W. Goossens,et al.  An efficient on-chip NI offering guaranteed services, shared-memory abstraction, and flexible network configuration , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[5]  Jürgen Teich,et al.  Hardware-assisted Decentralized Resource Management for Networks on Chip with QoS , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[6]  Niraj K. Jha,et al.  A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS , 2007, ICCD.

[7]  A. Kumary,et al.  A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS , 2007 .

[8]  Gerard J. M. Smit,et al.  A virtual channel network-on-chip for GT and BE traffic , 2006, IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and Architectures (ISVLSI'06).

[9]  Natalie D. Enright Jerger,et al.  Achieving predictable performance through better memory controller placement in many-core CMPs , 2009, ISCA '09.

[10]  Henry Hoffmann,et al.  On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.

[11]  Sriram R. Vangal,et al.  A 2 Tb/s 6 × 4 Mesh Network for a Single-Chip Cloud Computer With DVFS in 45 nm CMOS , 2011, VLSIC 2011.

[12]  Andreas Herkersdorf,et al.  Hierarchical NoCs for Optimized Access to Shared Memory and IO Resources , 2009, 2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools.

[13]  William J. Dally,et al.  Virtual-channel flow control , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[14]  Aamer Jaleel,et al.  Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[15]  Jörg Henkel,et al.  Invasive manycore architectures , 2012, 17th Asia and South Pacific Design Automation Conference.

[16]  Jens Sparsø,et al.  A router architecture for connection-oriented service guarantees in the MANGO clockless network-on-chip , 2005, Design, Automation and Test in Europe.

[17]  Kees Goossens,et al.  AEthereal network on chip: concepts, architectures, and implementations , 2005, IEEE Design & Test of Computers.

[18]  Jürgen Becker,et al.  A Scalable NoC Router Design Providing QoS Support Using Weighted Round Robin Scheduling , 2012, 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications.

[19]  Lionel M. Ni,et al.  A survey of wormhole routing techniques in direct networks , 1993, Computer.

[20]  Sriram R. Vangal,et al.  A 2 Tb/s 6$\,\times\,$ 4 Mesh Network for a Single-Chip Cloud Computer With DVFS in 45 nm CMOS , 2011, IEEE Journal of Solid-State Circuits.

[21]  Trevor N. Mudge,et al.  A performance comparison of contemporary DRAM architectures , 1999, ISCA.

[22]  Luca Benini,et al.  Networks on Chips : A New SoC Paradigm , 2022 .