Reducing Snoop-Induced Power Consumption in Small-Scale , Bus-Based SMP Systems

We propose methods for reducing the power required for handling snoop requests in small-scale, snoop-coherence, bus-based SMP systems. Observing that a large fraction of snoops do not find copies on all other caches, we introduce JETTY, which is a small, cache-like structure. A JETTY is placed in-between the bus and the L2 backside of each processor where it acts as a filter for snoop requests. In particular, it filters the vast majority of snoops that would not find a locally cached copy. We propose a number of alternative JETTY methods that operate by identifying either a subset of non-locally-cached blocks or a superset of locally cached blocks. We demonstrate that for a set of parallel applications and a 4-way SMP system, relatively small JETTY structures can filter up to 77% of all snoops that would miss on the average. This resulted in average power reduction of 41% measured as a fraction of the power required for all snoop-induced tag-array

[1]  William H. Mangione-Smith,et al.  The filter cache: an energy efficient memory structure , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[2]  Ibrahim N. Hajj,et al.  Architectural and compiler support for energy reduction in the memory hierarchy of high performance microprocessors , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[3]  David A. Wood,et al.  Cost-Effective Parallel Computing , 1995, Computer.

[4]  David A. Rennels,et al.  Reducing the frequency of tag compares for low power I-cache design , 1995, ISLPED '95.

[5]  Anoop Gupta,et al.  Parallel computer architecture - a hardware / software approach , 1998 .

[6]  James R. Larus,et al.  Efficient support for irregular applications on distributed-memory machines , 1995, PPOPP '95.

[7]  Luiz André Barroso,et al.  Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[8]  Tomás Lang,et al.  Reducing TLB power requirements , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.

[9]  T. N. Vijaykumar,et al.  Is SC + ILP = RC? , 1999, ISCA.

[10]  Sarita V. Adve,et al.  Shared Memory Consistency Models: A Tutorial , 1996, Computer.

[11]  Kunle Olukotun,et al.  The Stanford Hydra CMP , 2000, IEEE Micro.

[12]  A. Seznec,et al.  Decoupled sectored caches: conciliating low tag implementation cost and low miss ratio , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[13]  Vivek Tiwari,et al.  Reducing power in high-performance microprocessors , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).

[14]  Qing Yang,et al.  CAT—caching address tags: a technique for reducing area cost of on-chip caches , 1995, ISCA.

[15]  Anoop Gupta,et al.  The Stanford FLASH multiprocessor , 1994, ISCA '94.

[16]  Uming Ko,et al.  Energy optimization of multilevel cache architectures for RISC and CISC processors , 1998, IEEE Trans. Very Large Scale Integr. Syst..

[17]  Kanad Ghose,et al.  ENERGY EFFICIENT CACHE ORGANIZATIONS FOR SUPERSCALAR PROCESSORS , 1998 .

[18]  Norman P. Jouppi,et al.  WRL Research Report 93/5: An Enhanced Access and Cycle Time Model for On-chip Caches , 1994 .

[19]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[20]  David H. Albonesi Dynamic IPC/clock rate optimization , 1998, ISCA.

[21]  Interconnect design for deep submicron ICs , 1997, 1997 Proceedings of IEEE International Conference on Computer Aided Design (ICCAD).

[22]  B. M. Gordon,et al.  Supply and threshold voltage scaling for low power CMOS , 1997, IEEE J. Solid State Circuits.

[23]  S. Seznec,et al.  Don't Use the Page Number, but a Pointer to It , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[24]  Kanad Ghose,et al.  Analytical energy dissipation models for low-power caches , 1997, ISLPED '97.

[25]  Dirk Grunwald,et al.  Pipeline gating: speculation control for energy reduction , 1998, ISCA.

[26]  Larry L. Biro,et al.  Power considerations in the design of the Alpha 21264 microprocessor , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).

[27]  Alvin M. Despain,et al.  Cache design trade-offs for power and performance optimization: a case study , 1995, ISLPED '95.