Helia: Heterogeneous Interconnect for Low Resolution Cache Access in snoop-based chip multiprocessors

In this work we introduce Heterogeneous Interconnect for Low Resolution Cache Access (Helia). Helia improves energy efficiency in snoop-based chip multiprocessors as it eliminates unnecessary activities in both interconnect and cache. This is achieved by using innovative snoop filtering mechanisms coupled with wire management techniques. Our optimizations rely on the observation that a high percentage of cache mismatches could be detected by utilizing a small subset but highly informative portion of the tag bits. Helia relies on the snoop controller to detect possible remote tag mismatches prior to tag array lookup. Power is reduced as a) our wire management techniques permit slow transmission of a subset of tag bits while tag mismatches are being detected and b) we avoid cache access for mismatches detected at the snoop controller. Our Evaluation shows that Helia reduces power in interconnect (dynamic: 64% to 75%, static: 45% to 50%) and cache tag array (dynamic: 57% to 58%, static: 80%) while improving average performance up to 4.4%.

[1]  Karthik Ramani,et al.  Microarchitectural wire management for performance and power in partitioned architectures , 2005, 11th International Symposium on High-Performance Computer Architecture.

[2]  Amirali Baniasadi,et al.  Using supplier locality in power-aware interconnects and caches in chip multiprocessors , 2008, J. Syst. Archit..

[3]  Kaustav Banerjee,et al.  A power-optimal repeater insertion methodology for global interconnects in nanometer designs , 2002 .

[4]  Hsien-Hsin S. Lee,et al.  Exploiting access semantics and program behavior to reduce snoop power in chip multiprocessors , 2008, ASPLOS.

[5]  Stephen P. Boyd,et al.  Graph Implementations for Nonsmooth Convex Programs , 2008, Recent Advances in Learning and Control.

[6]  Niraj K. Jha,et al.  In-Network Snoop Ordering (INSO): Snoopy coherence on unordered interconnects , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[7]  J. Torrellas,et al.  Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors , 2006, ISCA 2006.

[8]  Krste Asanovic,et al.  Replacing global wires with an on-chip network: a power analysis , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..

[9]  Shanq-Jang Ruan,et al.  Sentry tag: an efficient filter scheme for low power cache , 2002 .

[10]  Mikko H. Lipasti,et al.  Power-Efficient Cache Coherence , 2004 .

[11]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[12]  Li Zhao,et al.  Exploring DRAM cache architectures for CMP server platforms , 2007, 2007 25th International Conference on Computer Design.

[13]  Mikko H. Lipasti,et al.  Improving multiprocessor performance with coarse-grain coherence tracking , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[14]  Matthias A. Blumrich,et al.  Design and implementation of the blue gene/P snoop filter , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[15]  Dean M. Tullsen,et al.  Interconnections in multi-core architectures: understanding mechanisms, overheads and scaling , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[16]  Karthik Ramani,et al.  Interconnect-Aware Coherence Protocols for Chip Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[17]  Rajeev Balasubramonian,et al.  Towards scalable, energy-efficient, bus-based on-chip networks , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[18]  Vivek De,et al.  Technology and design challenges for low power and high performance [microprocessors] , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[19]  Per Stenström,et al.  TLB and snoop energy-reduction using virtual caches in low-power chip-multiprocessors , 2002, ISLPED '02.

[20]  W. K. George,et al.  University of Illinois at Urbana-Champain , 1997 .

[21]  Andreas Moshovos RegionScout: exploiting coarse grain sharing in snoop-based coherence , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[22]  Babak Falsafi,et al.  JETTY: filtering snoops for reduced energy consumption in SMP servers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[23]  Niraj K. Jha,et al.  In-Network Coherence Filtering: Snoopy coherence without broadcasts , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[24]  Norman P. Jouppi,et al.  Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[25]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[26]  Gu-Yeon Wei,et al.  Process Variation Tolerant 3T1D-Based Cache Architectures , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).