Cache resident data locality analysis

The data cache organization of a computer can significantly affect overall data access latencies when a program is executed. The cache performance depends on the locality characteristics of the data being processed in a program as well as the underlying architecture. A typical executing program has a data access profile that exhibits both temporal and spatial locality characteristics. Since most processors contain single data caches at a given level and since the single data cache cannot be optimized for purely spatial nor purely temporal locality data accesses, cache space pollution and inefficient usage of cache resources can occur. In the worst case, these phenomena can actually introduce additional data access latency through repeated line fills. An analysis and modeling scheme is presented that describes the runtime data access behavior of several benchmark programs in a typical, unified data cache. The motivation for the development of this model is to produce information that may aid in the design of a split data cache with one side optimized for temporal locality accesses and the other for spatial locality accesses.

[1]  Veljko M. Milutinovic,et al.  Some solutions for critical problems in the theory and practice of distributed shared memory: ideas and implications , 1997, Proceedings of the Thirtieth Hawaii International Conference on System Sciences.

[2]  Sanjeev Kumar,et al.  Exploiting spatial locality in data caches using spatial footprints , 1998, ISCA.

[3]  Tien-Fu Chen,et al.  Reducing memory penalty by a programmable prefetch engine for on-chip caches , 1997, Microprocess. Microsystems.

[4]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[5]  Wen-mei W. Hwu,et al.  Run-time spatial locality detection and optimization , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[6]  D. Burger,et al.  Memory Bandwidth Limitations of Future Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[7]  Ken Kennedy,et al.  Software prefetching , 1991, ASPLOS IV.

[8]  Alan L. Cox,et al.  Combining compile-time and run-time support for efficient software distributed shared memory , 1999 .

[9]  Olivier Temam,et al.  A quantitative analysis of loop nest locality , 1996, ASPLOS VII.

[10]  Jean-Loup Baer,et al.  A performance study of software and hardware data prefetching schemes , 1994, ISCA '94.

[11]  Antonio González,et al.  Fast, accurate and flexible data locality analysis , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).

[12]  Walid A. Najjar,et al.  An evaluation of bottom-up and top-down thread generation techniques , 1993, MICRO 1993.

[13]  Mateo Valero,et al.  Static locality analysis for cache management , 1997, Proceedings 1997 International Conference on Parallel Architectures and Compilation Techniques.

[14]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[15]  Mateo Valero,et al.  A Data Cache with Multiple Caching Strategies Tuned to Different Types of Locality , 1995, International Conference on Supercomputing.

[16]  Katherine Yelick,et al.  A Case for Intelligent RAM: IRAM , 1997 .

[17]  Jason Xin Zheng,et al.  Design of the HP PA 7200 CPU , 1996 .

[18]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[19]  Christoforos E. Kozyrakis,et al.  A case for intelligent RAM , 1997, IEEE Micro.

[20]  Michael J. Flynn,et al.  Computer Architecture: Pipelined and Parallel Processor Design , 1995 .