ConGen: An Application Specific DRAM Memory Controller Generator

The increasing gap between the bandwidth requirements of modern Systems on Chip (SoC) and the I/O data rate delivered by Dynamic Random Access Memory (DRAM), known as the Memory Wall, limits the performance of today's data-intensive applications. General purpose memory controllers use online scheduling techniques in order to increase the memory bandwidth. Due to a limited buffer depth they only have a local view on the executed application. However, numerous applications possess regular or fixed memory access patterns, which are not yet exploited to overcome the memory wall. In this paper, we present a holistic methodology to generate an Application Specific Memory Controller (ASMC), which has a global view on the application and utilizes application knowledge to decrease the energy and increase the bandwidth. To generate an ASMC we analyze the DRAM access pattern of the application offline and generate a custom address mapping by solving a combinatorial sequence partitioning problem.

[1]  Erik Brunvand,et al.  Impulse: building a smarter memory controller , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[2]  Engin Ipek,et al.  PARDIS: A programmable memory controller for the DDRx interfacing standards , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[3]  Aravind Srinivasan,et al.  Probability and Computing , 2018, SIGA.

[4]  H. Fleischner Eulerian graphs and related topics , 1990 .

[5]  Calvin Lin,et al.  Adaptive History-Based Memory Schedulers , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[6]  Franz Franchetti,et al.  HAMLeT: Hardware accelerated memory layout transform within 3D-stacked DRAM , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).

[7]  Kevin Kai-Wei Chang,et al.  Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[8]  Aamer Jaleel,et al.  DReAM: Dynamic Re-arrangement of Address Mapping to Improve the Performance of DRAMs , 2015, MEMSYS.

[9]  Wei-Fen Lin,et al.  Reducing DRAM latencies with an integrated memory hierarchy design , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[10]  Mahmut T. Kandemir,et al.  Estimating influence of data layout optimizations on SDRAM energy consumption , 2003, ISLPED '03.

[11]  Norbert Wehn,et al.  Omitting Refresh: A Case Study for Commodity and Wide I/O DRAMs , 2015, MEMSYS.

[12]  Elwood S. Buffa,et al.  Graph Theory with Applications , 1977 .

[13]  Luca Benini,et al.  Logic-Base Interconnect Design for Near Memory Computing in the Smart Memory Cube , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[14]  Bruce Jacob,et al.  Memory Systems: Cache, DRAM, Disk , 2007 .

[15]  N. Wehn,et al.  Power Modelling of 3 D-Stacked Memories with TLM 2 . 0 based Virtual Platforms , 2013 .

[16]  Xiaobing Feng,et al.  Software-Hardware Cooperative DRAM Bank Partitioning for Chip Multiprocessors , 2010, NPC.

[17]  Onur Mutlu,et al.  An experimental study of data retention behavior in modern DRAM devices: implications for retention time profiling mechanisms , 2013, ISCA.

[18]  Onur Mutlu,et al.  Gather-Scatter DRAM: In-DRAM address translation to improve the spatial locality of non-unit strided accesses , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[19]  William J. Dally,et al.  Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[20]  J. Pach,et al.  Wiley‐Interscience Series in Discrete Mathematics and Optimization , 2011 .

[21]  Frank Harary,et al.  Graph Theory , 2016 .

[22]  Peter Sanders,et al.  Think Locally, Act Globally: Highly Balanced Graph Partitioning , 2013, SEA.

[23]  Norbert Wehn,et al.  DRAMSys: A Flexible DRAM Subsystem Design Space Exploration Framework , 2015, IPSJ Trans. Syst. LSI Des. Methodol..

[24]  Jun Shao,et al.  The bit-reversal SDRAM address mapping , 2005, SCOPES '05.

[25]  Viktor K. Prasanna,et al.  DRAM Row Activation Energy Optimization for Stride Memory Access on FPGA-Based Systems , 2015, ARC.

[26]  Onur Mutlu,et al.  Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems , 2008, 2008 International Symposium on Computer Architecture.

[27]  Ted K. Ralphs,et al.  Integer and Combinatorial Optimization , 2013 .

[28]  Zhao Zhang,et al.  A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality , 2000, MICRO 33.

[29]  George A. Constantinides,et al.  Application Specific Memory Access, Reuse and Reordering for SDRAM , 2011, ARC.

[30]  Peter Pirsch,et al.  Using SDRAMs for two-dimensional accesses of long 2n × 2m-point FFTs and transposing , 2011, 2011 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[31]  Andreas Emil Feldmann,et al.  Fast balanced partitioning is hard even on grids and trees , 2011, Theor. Comput. Sci..

[32]  Igor L. Markov,et al.  Breaking instance-independent symmetries in exact graph coloring , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[33]  Tomas Rokicki,et al.  Indexing Memory Banks to Maximize Page Mode Hit Percentage and Minimize Memory Latency , 2003 .