——————————F—————————— HE concept of cache memory has emerged as a solution for the ever increasing time domain gap between processor technology and memory technology. Since the very early works of Wilkes [13], the concept has evolved into a sophisticated system of hardware-implemented and software-implemented solutions. Actually, the best performance/complexity ratio is obtained through a synergistic interaction of hardware-based and software-based solutions. The efficiency of the caching system is achieved through appropriate exploitation of the principles of temporal and spatial locality. Traditionally, temporal locality means that the probability is relatively high that a data or an instruction item will be reused in the near future. Spatial locality means that the probability is relatively high that the next data or instruction item to be used is in some way neighboring the previously used data or instruction item. In traditional systems, temporal locality is exploited by keeping some of the most recently used data/instructions in the cache memory and by incorporating the cache hierarchy. Spatial locality is exploited by using larger cache blocks and by incorporating the prefetching mechanisms into the caching system. As technology gets more and more sophisticated, it has become obvious that a much better performance can be achieved through the incorporation of more sophisticated solutions for enhancing and exploiting of the locality present in the code or data. As microprocessors get more and more complex, cache design and performance become more and more impacted by the solutions utilized in other domains, like superpipelining, superscaling, multithreading, prediction, parallelization, etc. Implementation issues in modern microprocessor systems are getting new dimensions. The issues of most
[1]
Cosimo Antonio Prete,et al.
The ChARM tool for tuning embedded systems
,
1997,
IEEE Micro.
[2]
Walid A. Najjar,et al.
Experimental Evaluation of Array Caches
,
1997
.
[3]
Wen-mei W. Hwu,et al.
Run-time spatial locality detection and optimization
,
1997,
Proceedings of 30th Annual International Symposium on Microarchitecture.
[4]
Veljko M. Milutinovic,et al.
Distributed shared memory: concepts and systems
,
1997,
IEEE Parallel Distributed Technol. Syst. Appl..
[5]
J. K. Archibald.
The cache coherence problem in shared-memory multiprocessors
,
1987
.
[6]
Edward S. Davidson,et al.
Reducing conflicts in direct-mapped caches with a temporality-based design
,
1996,
Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.
[7]
Josep Torrellas,et al.
Data forwarding in scalable shared-memory multiprocessors
,
1995,
ICS '95.
[8]
Ana Pont,et al.
The split data cache in multiprocessor systems: an initial hit ratio analysis
,
1999,
Proceedings of the Seventh Euromicro Workshop on Parallel and Distributed Processing. PDP'99.
[9]
Sarita V. Adve,et al.
An evaluation of fine-grain producer-initiated communication in cache-coherent multiprocessors
,
1997,
Proceedings Third International Symposium on High-Performance Computer Architecture.