Reducing memory latency using a small software driven array cache

From the programming viewpoint, data references can be classified into two types: array reference and non-array references. Array references have relatively strong spatial locality while non-array references have relatively strong temporal locality. However, in current data cache designs, the hardware cannot distinguish between these two types of references. Both types of data are stored in the same cache space and all cache control mechanisms such as prefetching are applied to array references as well as to non-array references. As a result, data cache performance is often not satisfactory. The large working set of array references with weak temporal locality interferes with the small working set of non-array references with strong temporal locality and replaces them away from cache. Applying hardware driven data prefetching scheme to array references might improve cache performance. However, when the same scheme is applied to non-array references, cache performance might be lost due to serious cache pollution. To solve all these problems, this paper proposes a new software driven cache design, called the array cache. The main idea is to use a separate cache space to store and handle array references with constant strides that are prefetched accurately with the help of the compiler and with extremely low runtime overhead.<<ETX>>

[1]  Ken Kennedy,et al.  Software prefetching , 1991, ASPLOS IV.

[2]  David Kroft,et al.  Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.

[3]  Henk Sol,et al.  Proceedings of the 54th Hawaii International Conference on System Sciences , 1997, HICSS 2015.

[4]  Rajiv Gupta,et al.  Predictability of load/store instruction latencies , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.

[5]  Ken Kennedy,et al.  Software methods for improvement of cache performance on supercomputer applications , 1989 .

[6]  Henry M. Levy,et al.  An Architecture for Software-Controlled Data Prefetching , 1991, ISCA.

[7]  Scott A. Mahlke,et al.  Data access microarchitectures for superscalar processors with compiler-assisted data prefetching , 1991, MICRO 24.

[8]  Michel Dubois,et al.  Concurrent Miss Resolution in Multiprocessor Caches , 1988, ICPP.

[9]  Anoop Gupta,et al.  Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.

[10]  Alexander V. Veidenbaum,et al.  Compiler-directed data prefetching in multiprocessors with memory hierarchies , 1990 .

[11]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[12]  G. A. Brent Using program structure to achieve prefetching for cache memories , 1987 .

[13]  Janak H. Patel,et al.  Data prefetching in multiprocessor vector cache memories , 1991, ISCA '91.

[14]  Walid A. Najjar,et al.  An evaluation of bottom-up and top-down thread generation techniques , 1993, MICRO 1993.

[15]  J.W.C. Fu,et al.  Data prefetching in multiprocessor vector cache memories , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[16]  H GornishEdward,et al.  Compiler-directed data prefetching in multiprocessors with memory hierarchies , 1990 .