Using Heterogeneous Networks to Improve Energy Efficiency in Direct Coherence Protocols for Many-Core CMPs

Direct coherence protocols have been recently proposed as an alternative to directory-based protocols to keep cache coherence in many-core CMPs. Differently from directory-based protocols, in direct coherence the responsible for providing the requested data in case of a cache miss (i.e., the owner cache) is also tasked with keeping the updated directory information and serializing the different accesses to the block by all cores. This way, these protocols send requests directly to the owner cache, thus avoiding the indirection caused by accessing a separate directory (usually in the home node). A hints mechanism ensures a high hit rate when predicting the current owner of a block for sending requests, but at the price of significantly increasing network traffic, and consequently, energy consumption. In this work, we show how using a heterogeneous interconnection network composed of two kinds of links is enough to drastically reduce the energy consumed by hint messages, obtaining significant improvements in energy efficiency.

[1]  Alberto Ros,et al.  Direct Coherence: Bringing Together Performance and Scalability in Shared-Memory Multiprocessors , 2007, HiPC.

[2]  Min Xu,et al.  Evaluating Non-deterministic Multi-threaded Commercial Workloads , 2001 .

[3]  David Wentzlaff,et al.  Processor: A 64-Core SoC with Mesh Interconnect , 2010 .

[4]  Uri C. Weiser,et al.  Interconnect-power dissipation in a microprocessor , 2004, SLIP '04.

[5]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[6]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[7]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[8]  Christoforos E. Kozyrakis,et al.  Comparing memory systems for chip multiprocessors , 2007, ISCA '07.

[9]  Niraj K. Jha,et al.  GARNET: A detailed on-chip network model inside a full-system simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[10]  Karthik Ramani,et al.  Interconnect-Aware Coherence Protocols for Chip Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[11]  Milo M. K. Martin,et al.  Why on-chip cache coherence is here to stay , 2012, Commun. ACM.

[12]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[13]  Manuel E. Acacio,et al.  Heterogeneous Interconnects for Energy-Efficient Message Management in CMPs , 2010, IEEE Transactions on Computers.

[14]  Krste Asanovic,et al.  Victim replication: maximizing capacity while hiding wire delay in tiled chip multiprocessors , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[15]  Alberto Ros,et al.  DiCo-CMP: Efficient cache coherency in tiled CMP architectures , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[16]  Kaustav Banerjee,et al.  A power-optimal repeater insertion methodology for global interconnects in nanometer designs , 2002 .