Breaking Address Mapping Symmetry at Multi-levels of Memory Heirarchy to Reduce DRAM Row-buffer Conflicts

DRAM row-buffers have become a critical level of cache in the memory hierarchy to exploit spatial locality in the cache miss stream. Row-buffer conflicts occur when a sequence of requests on different pages goes to the same memory bank, causing higher memory access latency than requests to the same row or to different banks. In this study, we first show that the address mapping symmetry between the cache and DRAM is the inherent source of row-buffer conflicts. Breaking the symmetry to reduce the conflicts and to retain the spatial locality, we propose and evaluate a permutation-based page interleaving scheme. We have also evaluated and compared two representative cache mapping schemes that break the symmetry at the cache level. We show that the proposed page interleaving scheme outperforms all other mapping schemes based on its overall performance and on its implementation simplicity.

[1]  James E. Smith,et al.  Performance Of Cached Dram Organizations In Vector Supercomputers , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[2]  C.-L. Chen,et al.  Analysis of vector access performance on skewed interleaved memory , 1989, ISCA '89.

[3]  Douglas W. Clark,et al.  A Characterization of Processor Performance in the vax-11/780 , 1984, ISCA '84.

[4]  Trevor N. Mudge,et al.  A performance comparison of contemporary DRAM architectures , 1999, ISCA.

[5]  John L. Henning SPEC CPU2000: Measuring CPU Performance in the New Millennium , 2000, Computer.

[6]  Kevin Skadron,et al.  Design issues and tradeoffs for write buffers , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[7]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[8]  Gurindar S. Sohi High-Bandwidth Interleaved Memories for Vector Processors-A Simulation Study , 1993, IEEE Trans. Computers.

[9]  André Seznec,et al.  A case for two-way skewed-associative caches , 1993, ISCA '93.

[10]  Teruo Tanaka,et al.  Scalable parallel memory architecture with a skew scheme , 1993, ICS '93.

[11]  Q. S. Gao The Chinese remainder theorem and the prime memory system , 1993, ISCA '93.

[12]  André Seznec,et al.  Interleaved parallel schemes: improving memory throughput on supercomputers , 1992, ISCA '92.

[13]  William J. Dally,et al.  Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[14]  Zhao Zhang,et al.  A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality , 2000, MICRO 33.

[15]  A. Gonzalez,et al.  Cache sensitive module scheduling , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[16]  John H. Zurawski,et al.  The Design and Verification of the AlphaStation 600 5-series Workstation , 1995, Digit. Tech. J..

[17]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[18]  Xiaobo Li,et al.  XOR Storage Schemes for Frequently Used Data Patterns , 1995, J. Parallel Distributed Comput..

[19]  Eduard Ayguadé,et al.  Conflict-free access of vectors with power-of-two strides , 1992, ICS '92.

[20]  Mateo Valero,et al.  Eliminating cache conflict misses through XOR-based placement functions , 1997, ICS '97.

[21]  D. Burger,et al.  Memory Bandwidth Limitations of Future Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[22]  B. Ramakrishna Rau,et al.  The Cydram 5 Stride-Insensitive Memory System , 1989, ICPP.

[23]  Wei-Fen Lin,et al.  Reducing DRAM latencies with an integrated memory hierarchy design , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[24]  F. Jesús Sánchez,et al.  Cache Sensitive Modulo Scheduling , 1997, MICRO.

[25]  B. Ramakrishna Rau,et al.  Pseudo-randomly interleaved memory , 1991, ISCA '91.

[26]  David T. Harper,et al.  Performance Evaluation of Vector Accesses in Parallel Memories Using a Skewed Storage Scheme , 1986, ISCA.