A hybrid packet/circuit-switched router to accelerate memory access in NoC-based chip multiprocessors

Modern chip multiprocessors will feature a large shared last-level cache (LLC) that is decomposed into smaller slices and physically distributed throughout the chip area. These architectures rely on a network-on-chip (NoC) to handle remote cache access and hence, NoCs play a critical role in optimizing memory access latency and power consumption. Circuit-switching is the most power- and performance-efficient switching mechanism in NoCs, but is not advantageous when the packet transmission time is not long enough compared to the circuit setup time. In this paper, we propose a zero-latency circuit setup scheme to make circuit-switching applicable in transferring individual data packets. The design leverages the fact that in CMPs with distributed LLC (where a considerable portion of the on-chip traffic is composed of remote LLC access requests and data responses), every response packet is sent in reply to a request packet and traverses the same path as its corresponding request, but at the backward direction. The short request packets, then, are responsible to reserve a path for their corresponding response packets. This NoC tries to reduce conflict among circuit paths by considering conflicts in backward direction during request packet routing, backed by a run-time technique to resolve conflicts when circuits are actually set up. Experimental results show that the proposed NoC architecture considerably reduces average packet latency that directly translates to faster memory access.

[1]  Li-Shiuan Peh,et al.  Breaking the on-chip latency barrier using SMART , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[2]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[3]  Andrew B. Kahng,et al.  Explicit modeling of control and data for improved NoC router estimation , 2012, DAC Design Automation Conference 2012.

[4]  Hamid Sarbazi-Azad,et al.  Virtual Point-to-Point Connections for NoCs , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[5]  Christoforos E. Kozyrakis,et al.  The ZCache: Decoupling Ways and Associativity , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[6]  Axel Jantsch,et al.  Architecture Support and Comparison of Three Memory Consistency Models in NoC Based Systems , 2012, 2012 15th Euromicro Conference on Digital System Design.

[7]  Babak Falsafi,et al.  Toward Dark Silicon in Servers , 2011, IEEE Micro.

[8]  Ran Ginosar,et al.  QNoC: QoS architecture and design process for network on chip , 2004, J. Syst. Archit..

[9]  Hamid Sarbazi-Azad,et al.  Application-Aware Topology Reconfiguration for On-Chip Networks , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[10]  Anantha Chandrakasan,et al.  SCORPIO: A 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[11]  Babak Falsafi,et al.  Reactive NUCA: near-optimal block placement and replication in distributed caches , 2009, ISCA '09.