Analytical synthesis of bandwidth-efficient SDRAM address generators

SDRAM memories are a commodity technology which deliver fast, cheap and high capacity external memory in many cost-sensitive embedded applications. When designing with SDRAM memory, the memory bandwidth available is strongly dependent on the sequence of addresses requested. For applications with hard real-time performance requirements, it is prudent to perform at compile time, some form of analysis to guarantee those hard real-time deadlines are met. In general with SDRAM memories, this analysis is difficult, and this leads to conservative implementations. On-chip memory buffers can make possible data reuse and request reordering which together ensure bandwidth on an SDRAM interface is used efficiently. This paper outlines an automated procedure for synthesizing application-specific address generators which exploit data-reuse in on-chip memory and transaction reordering on an external memory interface. We quantify the impact this has on memory bandwidth over a range of representative benchmarks. Across a range of parameterized designs, we observe up to 50x reduction in the quantity of data fetched from external memory. This, combined with reordering of the transactions, allows up to 128x reduction in the memory access time of certain memory-intensive benchmarks implemented in an FPGA. Since the synthesis procedure results in monotonic memory addressing functions, we can extract tight worst-case execution (WCET) bounds that are useful in system analysis. We show that we can extract performance guarantees which are significantly tighter than the absolute worst-case SDRAM performance.

[1]  Kees G. W. Goossens,et al.  Predator: A predictable SDRAM memory controller , 2007, 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[2]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[3]  Qiang Liu,et al.  Data Reuse Exploration for FPGA Based Platforms Applied to the Full Search Motion Estimation Algorithm , 2006, 2006 International Conference on Field Programmable Logic and Applications.

[4]  P. Feautrier Parametric integer programming , 1988 .

[5]  Pedro C. Diniz,et al.  Compiler-directed design space exploration for caching and prefetching data in high-level synthesis , 2005, Proceedings. 2005 IEEE International Conference on Field-Programmable Technology, 2005..

[6]  Paul Feautrier,et al.  Automatic Parallelization in the Polytope Model , 1996, The Data Parallel Programming Model.

[7]  Jakob Engblom,et al.  The worst-case execution-time problem—overview of methods and survey of tools , 2008, TECS.

[8]  Martin Griebl,et al.  Automatic code generation for distributed memory architectures in the polytope model , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[9]  George A. Constantinides,et al.  Application Specific Memory Access, Reuse and Reordering for SDRAM , 2011, ARC.

[10]  William J. Dally,et al.  Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[11]  Christian Lengauer,et al.  Loop Parallelization in the Polytope Model , 1993, CONCUR.

[12]  Qiang Liu,et al.  Automatic On-chip Memory Minimization for Data Reuse , 2007 .

[13]  Vincent Loechner,et al.  Counting Integer Points in Parametric Polytopes Using Barvinok's Rational Functions , 2007, Algorithmica.

[14]  L. Benini,et al.  SDRAM-Energy-Aware Memory Allocation for Dynamic Multi-Media Applications on Multi-Processor Platforms , 2003, Embedded Software for SoC.

[15]  Gilles Villard,et al.  Lattice-Based Memory Allocation , 2005, IEEE Trans. Computers.

[16]  A. Nicolau,et al.  High-Level Synthesis with SDRAMs and RAMBUS DRAMs (Special Section on VLSI Design and CAD Algorithms) , 1999 .

[17]  Mahmut T. Kandemir,et al.  Estimating influence of data layout optimizations on SDRAM energy consumption , 2003, ISLPED '03.