Stack Caching Using Split Data Caches

In most embedded and general purpose architectures, stack data and non-stack data is cached together, meaning that writing to or loading from the stack may expel non-stack data from the data cache. Manipulation of the stack has a different memory access pattern than that of non-stack data, showing higher temporal and spatial locality. We propose caching stack and non-stack data separately and develop four different stack caches that allow this separation without requiring compiler support. These are the simple, window, and prefilling with and without tag stack caches. The performance of the stack cache architectures was evaluated using the Simple Scalar toolset where the window and prefilling stack cache without tag resulted in an execution speedup of up to 3.5% for the MiBench benchmarks, executed on an out-of-order processor with the ARM instruction set.

[1]  Martin Schoeberl,et al.  A time-predictable stack cache , 2013, 16th IEEE International Symposium on Object/component/service-oriented Real-time distributed Computing (ISORC 2013).

[2]  Martin Schoeberl,et al.  Towards a Time-predictable Dual-Issue Microprocessor: The Patmos Approach , 2011, PPES.

[3]  Martin Schoeberl,et al.  Design and implementation of an efficient stack machine , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[4]  David Hardin Real-time objects on the bare metal: an efficient hardware realization of the Java/sup TM/ Virtual Machine , 2001, Fourth IEEE International Symposium on Object-Oriented Real-Time Distributed Computing. ISORC 2001.

[5]  M. Smelyanskiy,et al.  Stack value file: custom microarchitecture for the stack , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[6]  Aviral Shrivastava,et al.  SSDM: Smart Stack Data Management for software managed multicores (SMMs) , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[7]  Mark D. Hill,et al.  Revisiting Stack Caches for Energy Efficiency , 2014 .

[8]  Sascha Uhrig,et al.  jamuth: an IP processor core for embedded Java real-time systems , 2007, JTRES.

[9]  Aviral Shrivastava,et al.  A software solution for dynamic stack management on scratch pad memory , 2009, 2009 Asia and South Pacific Design Automation Conference.

[10]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[11]  Martin Schoeberl,et al.  A Java processor architecture for embedded real-time systems , 2008, J. Syst. Archit..

[12]  Martin Schoeberl,et al.  Time-predictable Cache Organization , 2009, 2009 Software Technologies for Future Dependable Distributed Systems.

[13]  Rajesh Kannan Megalingam,et al.  Phased set associative cache design for reduced power consumption , 2009, 2009 2nd IEEE International Conference on Computer Science and Information Technology.

[14]  Luigi Carro,et al.  Making Java Work for Microcontroller Applications , 2001, IEEE Des. Test Comput..

[15]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[16]  Peter Reichel,et al.  Secure, Real-Time and Multi-Threaded General-Purpose Embedded Java Microarchitecture , 2007, 10th Euromicro Conference on Digital System Design Architectures, Methods and Tools (DSD 2007).

[17]  Martin Schoeberl,et al.  Static analysis of worst-case stack cache behavior , 2013, RTNS '13.

[18]  Rajeev Barua,et al.  Recursive function data allocation to scratch-pad memory , 2007, CASES '07.

[19]  Soonhoi Ha,et al.  A Novel Technique to Use Scratch-pad Memory for Stack Management , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[20]  Benedikt Huber,et al.  Towards Time-Predictable Data Caches for Chip-Multiprocessors , 2009, SEUS.