A Versatile Data Cache for Trace Buffer Support

Since the cache system has been a predominant part in modern SoC's and its capacity is sometimes larger than necessary for specific applications, it is desirable to enhance the role of the cache system to beyond its original purpose (performance improvement). In this paper we propose a versatile data cache, called DT (data/trace) cache, by making it to function simultaneously as a regular data cache and as a trace buffer for real time software debugging and monitoring. It is accomplished by modifying the cache organization such that a portion of the cache ways can be configured as a trace buffer during the run time. The trace buffer stores the trace produced by some trace generation hardware while the rest portion of the data cache keeps its original role. The trace can be dumped out using the existing cache write back circuitry. The integration of a DT cache with an instruction cache, an academic ARM7 processor and a trace generator has been accomplished at RTL level. The hardware overhead is very minor and does not impair the global critical path delay. For a 16 KB (8 ways, 512 lines, 8 words) DT cache with only 1 way as the trace buffer, it is capable of storing 12771 cycles of the program trace at the cost of merely 553 gates and slight increase in cache miss rate on the average. The experiments show that the DT cache is a highly cost-effective approach for real time on-chip trace buffering.

[1]  Jason Cong,et al.  Dynamically reconfigurable hybrid cache: An energy-efficient last-level cache design , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[2]  Dionisios N. Pnevmatikatos,et al.  FPGA implementation of a configurable cache/scratchpad memory with virtualized user-level RDMA capability , 2009, 2009 International Symposium on Systems, Architectures, Modeling, and Simulation.

[3]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[4]  Tae-Jin Kim,et al.  Design and implementation of Performance Analysis Unit (PAU) for AXI-based multi-core System on Chip (SOC) , 2010, Microprocess. Microsystems.

[5]  Hiroshi Nakamura,et al.  SCIMA: Software controlled integrated memory architecture for high performance computing , 2000, Proceedings 2000 International Conference on Computer Design.

[6]  Peter Petrov,et al.  Dynamic and application-driven I-cache partitioning for low-power embedded multitasking , 2009, 2009 IEEE 7th Symposium on Application Specific Processors.

[7]  Ing-Jer Huang,et al.  A Hardware Approach to Real-Time Program Trace Compression for Embedded Processors , 2007, IEEE Transactions on Circuits and Systems I: Regular Papers.

[8]  Norman P. Jouppi,et al.  Reconfigurable caches and their application to media processing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[9]  Weng-Fai Wong,et al.  A reconfigurable instruction memory hierarchy for embedded systems , 2005, International Conference on Field Programmable Logic and Applications, 2005..

[10]  Arun K. Somani,et al.  A reconfigurable multifunction computing cache architecture , 2001, IEEE Trans. Very Large Scale Integr. Syst..

[11]  Richard T. Witek,et al.  A 160 MHz 32 b 0.5 W CMOS RISC microprocessor , 1996, 1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC.

[12]  Michael S. Hsiao,et al.  Trace Buffer-Based Silicon Debug with Lossless Compression , 2011, 2011 24th Internatioal Conference on VLSI Design.

[13]  Wei Zhang,et al.  Hybrid SPM-cache architectures to achieve high time predictability and performance , 2013, 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors.

[14]  Chun-Hung Lai,et al.  A trace-capable instruction cache for cost efficient real-time program trace compression in SoC , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[15]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[16]  Fu-Ching Yang,et al.  A Reverse-Encoding-Based On-Chip Bus Tracer for Efficient Circular-Buffer Utilization , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[17]  Jason Cong,et al.  An energy-efficient adaptive hybrid cache , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.

[18]  Fu-Ching Yang,et al.  An On-Chip AHB Bus Tracer With Real-Time Compression and Dynamic Multiresolution Supports for SoC , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[19]  Gregory T. Byrd,et al.  Exploiting producer patterns and L2 cache for timely dependence-based prefetching , 2008, 2008 IEEE International Conference on Computer Design.

[20]  Paolo Ienne,et al.  Way Stealing: Cache-assisted automatic Instruction Set Extensions , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[21]  Klaus D. McDonald-Maier,et al.  Debug support strategy for systems-on-chips with multiple processor cores , 2006, IEEE Transactions on Computers.

[22]  Weng-Fai Wong,et al.  DRIM : A Low Power Dynamically Reconfigurable Instruction Memory Hierarchy for Embedded Systems , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.