An Efficient Non-blocking Data Cache for Soft Processors

Soft processors often use data caches to reduce the gap between processor and main memory speeds. To achieve high efficiency, simple, blocking caches are used. Such caches are not appropriate for processor designs such as run ahead and out-of-order execution that require non-blocking caches to tolerate main memory latencies. Conventional non-blocking caches are expensive and slow on FPGAs as they use content-addressable memories (CAMs). This work exploits key properties of run ahead execution and demonstrates an FPGA-friendly non-blocking cache design that does not require CAMs. A non-blocking 4KB cache operates at 329MHz on Stratix III FPGAs while it uses only 270 logic elements. A 32KB non-blocking cache operates at 278Mhz and uses 269 logic elements.

[1]  Andreas Moshovos,et al.  Towards a viable out-of-order soft core: Copy-Free, checkpointed register renaming , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[2]  Trevor N. Mudge,et al.  Author retrospective improving data cache performance by pre-executing instructions under a cache miss , 1997, International Conference on Supercomputing.

[3]  Jonathan Rose,et al.  A parameterized automatic cache generator for FPGAs , 2003, Proceedings. 2003 IEEE International Conference on Field-Programmable Technology (FPT) (IEEE Cat. No.03EX798).

[4]  David R. Kaeli,et al.  A discussion on non-blocking/lockup-free caches , 1996, CARN.

[5]  James Coole,et al.  Traversal caches: a first step towards FPGA acceleration of pointer-based data structures , 2008, CODES+ISSS '08.