Revisiting Stack Caches for Energy Efficiency

With the growing focus on energy efficiency, it is important to find ways to reduce energy without sacrificing performance. The L1 data cache is a significant contributor to processor energy consumption. We advocate treating data from the program’s stack differently from non-stack data to reduce energy. We characterize stack accesses to determine how they differ from general memory accesses in terms of footprint, frequency, and ratio of loads to stores. We then propose two ways to optimize for these characteristics. First, the implicit stack cache limits stack data to residing in designated ways of the data cache, reducing the energy required per stack access. We show that it can reduce data cache dynamic energy by 37% with no reduction in performance. Second, the explicit stack cache stores stack data in a separate L1 cache. In addition to reducing the energy per access, it also has additional benefits over the implicit policy in that it can be virtually tagged and have a different writeback policy. We show that this approach can lead to additional energy savings, with no performance impact. These optimizations are implemented purely in the hardware and thus require no changes to existing code.

[1]  Francisco Tirado,et al.  Stack oriented data cache filtering , 2009, CODES+ISSS '09.

[2]  M. Smelyanskiy,et al.  Stack value file: custom microarchitecture for the stack , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[3]  Michael Butler,et al.  Bulldozer: An Approach to Multithreaded Compute Performance , 2011, IEEE Micro.

[4]  Stéphan Jourdan,et al.  Early load address resolution via register tracking , 2000, ISCA '00.

[5]  W. H. Wang,et al.  Organization and performance of a two-level virtual-real cache hierarchy , 1989, ISCA '89.

[6]  Michael M. Swift,et al.  Reducing memory reference energy with opportunistic virtual caching , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[7]  Sangyeun Cho,et al.  Decoupling local variable accesses in a wide-issue superscalar processor , 1999, ISCA.

[8]  Jongman Kim,et al.  A High-Performance and Energy-Efficient Virtually Tagged Stack Cache Architecture for Multi-core Environments , 2011, 2011 IEEE International Conference on High Performance Computing and Communications.

[9]  Kazuaki Murakami,et al.  Way-predicting set-associative cache for high performance and low energy consumption , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[10]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[11]  Ronald G. Dreslinski,et al.  Full-system analysis and characterization of interactive smartphone applications , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).

[12]  Michael C. Huang,et al.  L1 data cache decomposition for energy efficiency , 2001, ISLPED '01.

[13]  鈴木 昭二,et al.  Reliable Distributed Systems , 1998 .

[14]  D. Rh International symposium on pain. , 1973 .

[15]  Gary S. Tyson,et al.  Improving energy and performance of data cache architectures by exploiting memory reference characteristics , 2001 .

[16]  Amer Diwan,et al.  The DaCapo benchmarks: java benchmarking development and analysis , 2006, OOPSLA '06.

[17]  Joel Emer,et al.  Proceedings of the 50th Annual International Symposium on Computer Architecture , 2000, International Symposium on Computer Architecture.

[18]  Gary S. Tyson,et al.  Region-based caching: an energy-delay efficient memory architecture for embedded processors , 2000, CASES '00.

[19]  Kimming So,et al.  Cache Operations by MRU Change , 1988, IEEE Trans. Computers.

[20]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[21]  Ravishankar K. Iyer,et al.  Transparent runtime randomization for security , 2003, 22nd International Symposium on Reliable Distributed Systems, 2003. Proceedings..