Ghost routers: energy-efficient asymmetric multicore processors with symmetric NoCs

Asymmetric multicore architectures have been proposed to exploit the benefits of heterogeneous cores. However, asymmetric cores present challenge to network-on-chip (NoC) designers since the floorplan is not necessarily regular with "nodes" being different size. In contrast, most of the previously proposed NoC topologies commonly assume a regular or symmetric floorplan with equal size nodes. In this work, we first describe how asymmetric floorplan leads to asymmetric topology and can limit overall performance. To overcome the asymmetric floorplan, we present Ghost Routers - extra "dummy" routers that are added to the NoC to create a symmetric NoC architecture for asymmetric multicore architectures. Ghost router provides higher network path diversity and provides higher network performance that leads to higher system performance. Ghost routers also enable simpler routing algorithms because of the symmetric NoC architecture. While ghost routers is a simplistic modification to the NoC architecture, it does increase NoC cost. However, ghost routers exploit the observations that in realistic systems, the cost of NoC is not a significant fraction of overall system cost. Our evaluations show that ghost routers can improve performance by up to 21% while improving overall energy-efficiency of the system by up to 26%.

[1]  Dean M. Tullsen,et al.  Harnessing ISA diversity: Design of a heterogeneous-ISA chip multiprocessor , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[2]  David Blaauw,et al.  Scaling towards kilo-core processors with asymmetric high-radix topologies , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[3]  Shasi Kumar,et al.  A 2Tb/s 6×4 mesh network with DVFS and 2.3Tb/s/W router in 45nm CMOS , 2010, 2010 Symposium on VLSI Circuits.

[4]  M. Suzuoki,et al.  Overview of the architecture, circuit design, and physical implementation of a first-generation cell processor , 2006, IEEE Journal of Solid-State Circuits.

[5]  Natalie D. Enright Jerger,et al.  NoC Architectures for Silicon Interposer Systems: Why Pay for more Wires when you Can Get them (from your interposer) for Free? , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[6]  Prabhat Kumar,et al.  Exploring concentration and channel slicing in on-chip network router , 2009, 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip.

[7]  William J. Dally,et al.  Flattened Butterfly Topology for On-Chip Networks , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[8]  Sriram R. Vangal,et al.  A 5-GHz Mesh Interconnect for a Teraflops Processor , 2007, IEEE Micro.

[9]  DaeHo Seo,et al.  Near-Optimal Worst-Case Throughput Routing for Two-Dimensional Mesh Networks , 2005, ISCA 2005.

[10]  John Kim,et al.  Throughput-Effective On-Chip Networks for Manycore Accelerators , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[11]  David R. Kaeli,et al.  Asymmetric NoC Architectures for GPU Systems , 2015, NOCS.

[12]  William J. Dally,et al.  Design tradeoffs for tiled CMP on-chip networks , 2006, ICS '06.

[13]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[14]  John Kim,et al.  Low-cost router microarchitecture for on-chip networks , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[15]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[16]  Nan Jiang,et al.  A detailed and flexible cycle-accurate Network-on-Chip simulator , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[17]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[18]  Israel Cidon,et al.  Heterogeneous NoC Router Architecture , 2015, IEEE Transactions on Parallel and Distributed Systems.

[19]  Chita R. Das,et al.  A case for heterogeneous on-chip interconnects for CMPs , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).