Memory access pattern-aware DRAM performance model for multi-core systems

The DRAM latency modeling is complex because most chips contain row-buffers and multiple banks to exploit patterns of DRAM accesses. As a result, the latency of DRAM access not only depends on the circuit timing parameters but also memory access patterns. This study derives an analytical model that predicts the DRAM access performance using DRAM timing and memory access pattern parameters. As a performance metric, the bank busy time of DRAM is used. The pattern parameters employed represent memory access characteristics such as the number of row-buffer misses, the number of read or write requests that hit the row-buffers, etc. The proposed model not only relates the DRAM access performance with the memory access pattern but also provides information for timing optimization of next generation DRAMs. The model is evaluated with SPLASH-2 benchmark by using cycle-accurate timing simulations with DDR3 timings. The evaluation results show that, in memory-bounded cases, the execution time is limited by bank utilization, not by the data bus occupation ratio.

[1]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[2]  David R. Kaeli,et al.  Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures , 2011, IEEE Transactions on Parallel and Distributed Systems.

[3]  John L. Hennessy,et al.  The performance advantages of integrating block data transfer in cache-coherent multiprocessors , 1994, ASPLOS VI.

[4]  William J. Dally,et al.  Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[5]  John P. Hayes,et al.  On randomly interleaved memories , 1990, Proceedings SUPERCOMPUTING '90.

[6]  Zhao Zhang,et al.  A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality , 2000, MICRO 33.

[7]  Erich Strohmaier,et al.  Architecture independent performance characterization and benchmarking for scientific applications , 2004, The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, 2004. (MASCOTS 2004). Proceedings..

[8]  Jun Shao,et al.  The bit-reversal SDRAM address mapping , 2005, SCOPES '05.

[9]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[10]  Tor M. Aamodt,et al.  A Hybrid Analytical DRAM Performance Model , 2011 .

[11]  Bruce Jacob,et al.  Memory Systems: Cache, DRAM, Disk , 2007 .

[12]  Jung Ho Ahn,et al.  The Design Space of Data-Parallel Memory Systems , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[13]  Peter M. Kogge,et al.  On the Memory Access Patterns of Supercomputer Applications: Benchmark Selection and Its Implications , 2007, IEEE Transactions on Computers.

[14]  Donald A. Calahan,et al.  Models of Access Delays in Multiprocessor Memories , 1992, IEEE Trans. Parallel Distributed Syst..

[15]  Aamer Jaleel,et al.  Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[16]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.

[17]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .