A hybrid NoC design for cache coherence optimization for chip multiprocessors

On chip many-core systems, evolving from prior multi-pro cessor systems, are considered as a promising solution to the performance scalability and power consumption problems. The long communication distance between the traditional multi-processors makes directory-based cache coherence protocols better solutions compared to bus-based snooping protocols even with the overheads from indirections. However, much smaller distances between the CMPcores enhance the reachability of buses, revitalizing the applicability of snooping protocols for cache-to-cache transfers. In this work, we propose a hybrid NoC design to provide optimized support for cache coherency. In our design, on-chip links can be dynamically configured as either point-to-point links between NoC nodes or short buses to facilitate localized snooping. By taking advantage of the best of both worlds, bus-based snooping coherency and NoC-based directory coherency, our approach brings both power and performance benefits.

[1]  W. Dally,et al.  Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[2]  Todd M. Austin,et al.  Polymorphic On-Chip Networks , 2008, 2008 International Symposium on Computer Architecture.

[3]  Rajeev Balasubramonian,et al.  Towards scalable, energy-efficient, bus-based on-chip networks , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[4]  Chita R. Das,et al.  Design and evaluation of a hierarchical on-chip interconnect for next-generation CMPs , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[5]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[6]  Karthik Ramani,et al.  Interconnect-Aware Coherence Protocols for Chip Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[7]  Mark D. Hill,et al.  Coherence Ordering for Ring-based Chip Multiprocessors , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[8]  Sharad Malik,et al.  Orion: a power-performance simulator for interconnection networks , 2002, MICRO.

[9]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[10]  Nick Barrow-Williams,et al.  Proximity coherence for chip multiprocessors , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[11]  Dean M. Tullsen,et al.  Proximity-aware directory-based coherence for multi-core processor architectures , 2007, SPAA '07.

[12]  Josep Torrellas,et al.  Uncorq: Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[13]  Natalie D. Enright Jerger,et al.  Virtual tree coherence: Leveraging regions and in-network multicast trees for scalable cache coherence , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[14]  Seda Ogrenci Memik,et al.  Thermal sensor allocation and placement for reconfigurable systems , 2006, ICCAD.

[15]  Norman P. Jouppi,et al.  Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[16]  Rakesh Kumar,et al.  A Workload-Adaptive and Reconfigurable Bus Architecture for Multicore Processors , 2010, Int. J. Reconfigurable Comput..

[17]  Giuseppe Di Battista,et al.  26 Computer Networks , 2004 .

[18]  Chita R. Das,et al.  Design and analysis of an NoC architecture from performance, reliability and energy perspective , 2005, 2005 Symposium on Architectures for Networking and Communications Systems (ANCS).

[19]  Rudolf Eigenmann,et al.  SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance , 2001, WOMPAT.

[20]  Taewhan Kim,et al.  Thermal sensor allocation and placement for reconfigurable systems , 2009, TODE.