Dynamically Variable Line-Size Cache Architecture for Merged DRAM/Logic LSIs

SUMMARY This paper proposes a novel cache architecture suitable for merged DRAM/logic LSIs, which is called “dynamically variable line-size cache (D-VLS cache).” The D-VLS cache can optimize its line-size according to the characteristic of programs, and attempts to improve the performance by exploiting the high on-chip memory bandwidth on merged DRAM/logic LSIs appropriately. In our evaluation, it is observed that an average memory-access time improvement achieved by a directmapped D-VLS cache is about 20% compared to a conventional direct-mapped cache with fixed 32-byte lines. This performance improvement is better than that of a doubled-size conventional direct-mapped cache ∗ .

[1]  Kunle Olukotun,et al.  Designing High Bandwidth On-Chip Caches , 1997, ISCA.

[2]  James R. Larus,et al.  Wisconsin Architectural Research Tool Set , 1993, CARN.

[3]  Fong Pong,et al.  Missing the Memory Wall: The Case for Processor/Memory Integration , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[4]  Kazuaki Murakami,et al.  Dynamically variable line-size cache exploiting high on-chip memory bandwidth of merged DRAM/logic LSIs , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[5]  Hiroto Yasuura,et al.  Development of A Standard Cell Library for VDEC , 1997 .

[6]  Kazuaki Murakami,et al.  High bandwidth, variable line-size cache architecture for merged DRAM/Logic LSIs , 1998 .

[7]  Sanjeev Kumar,et al.  Exploiting spatial locality in data caches using spatial footprints , 1998, ISCA.

[8]  Antonio Gonzalez,et al.  A data cache with multiple caching strategies tuned to different types of locality , 1995, International Conference on Supercomputing.

[9]  Cezary Dubnicki,et al.  Adjustable Block Size Coherent Caches , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[10]  K. Murakami,et al.  Parallel processing RAM chip with 256 Mb DRAM and quad processors , 1997, 1997 IEEE International Solids-State Circuits Conference. Digest of Technical Papers.

[11]  Wen-mei W. Hwu,et al.  Run-time spatial locality detection and optimization , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[12]  Norman P. Jouppi,et al.  CACTI: an enhanced cache access and cycle time model , 1996, IEEE J. Solid State Circuits.

[13]  D J Evans,et al.  Parallel processing , 1986 .

[14]  Mark D. Hill,et al.  A case for direct-mapped caches , 1988, Computer.

[15]  Michel Dubois,et al.  International Conference on Parallel Processing Fixed and Adaptive Sequential Prefetching in Shared Memory Multiprocessors , 2006 .

[16]  Christoforos E. Kozyrakis,et al.  A case for intelligent RAM , 1997, IEEE Micro.