Reducing non-deterministic loads in low-power caches via early cache set resolution

Many of the recently proposed techniques to reduce power consumption in caches introduce an additional level of non-determinism in cache access latency. Due to this additional latency, instructions dependent on a load speculatively issued must be squashed and re-issued as they will not have the correct data in time. Our experiments show that there is a large performance degradation and associated dynamic energy wastage due to these effects of instruction squashing. To address this problem, we propose an early cache set resolution scheme. Our experimental evaluation shows that this technique is quite effective in mitigating the problem.

[1]  Dirk Grunwald,et al.  Predictive sequential associative cache , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[2]  R. Iris Bahar,et al.  Effects of speculation on performance and issue queue design , 2004, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[3]  Narayanan Vijaykrishnan,et al.  On load latency in low-power caches , 2003, ISLPED '03.

[4]  Kazuaki Murakami,et al.  Way-predicting set-associative cache for high performance and low energy consumption , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[5]  David J. Sager,et al.  The microarchitecture of the Pentium 4 processor , 2001 .

[6]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[7]  Jignesh M. Patel,et al.  Data prefetching by dependence graph precomputation , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.

[8]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[9]  Kaushik Roy,et al.  An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[10]  Uri C. Weiser,et al.  Correlated load-address predictors , 1999, ISCA.

[11]  Joseph I. Chamdani,et al.  Low load latency through sum-addressed memory (SAM) , 1998, ISCA.

[12]  Joel S. Emer,et al.  Loose loops sink chips , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[13]  Norman P. Jouppi,et al.  An Integrated Cache Timing and Power Model , 2002 .

[14]  Koen De Bosschere,et al.  A technique for high bandwidth and deterministic low latency load/store accesses to multiple cache banks , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[15]  Stéphan Jourdan,et al.  Early load address resolution via register tracking , 2000, ISCA '00.

[16]  Kenneth C. Yeager The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.

[17]  Margaret Martonosi,et al.  Let caches decay: reducing leakage energy via exploitation of cache generational behavior , 2002, TOCS.

[18]  Jean-Loup Baer,et al.  Effective Hardware Based Data Prefetching for High-Performance Processors , 1995, IEEE Trans. Computers.

[19]  H. Ando,et al.  A preactivating mechanism for a VT-CMOS cache using address prediction , 2002, Proceedings of the International Symposium on Low Power Electronics and Design.

[20]  Kaushik Roy,et al.  Reducing set-associative cache energy via way-prediction and selective direct-mapping , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[21]  Dionisios N. Pnevmatikatos,et al.  Streamlining data cache access with fast address calculation , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[22]  William H. Mangione-Smith,et al.  Filtering Memory References to Increase Energy Efficiency , 2000, IEEE Trans. Computers.

[23]  T. Mudge,et al.  Drowsy caches: simple techniques for reducing leakage power , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.