An optimal memory allocation scheme for scratch-pad-based embedded systems

This article presents a technique for the efficient compiler management of software-exposed heterogeneous memory. In many lower-end embedded chips, often used in microcontrollers and DSP processors, heterogeneous memory units such as scratch-pad SRAM, internal DRAM, external DRAM, and ROM are visible directly to the software, without automatic management by a hardware caching mechanism. Instead, the memory units are mapped to different portions of the address space. Caches are avoided due to their cost and power consumption, and because they make it difficult to guarantee real-time performance. For this important class of embedded chips, the allocation of data to different memory units to maximize performance is the responsibility of the software.Current practice typically leaves it to the programmer to partition the data among different memory units. We present a compiler strategy that automatically partitions the data among the memory units. We show that this strategy is optimal, relative to the profile run, among all static partitions for global and stack data. For the first time, our allocation scheme for stacks distributes the stack among multiple memory units. For global and stack data, the scheme is provably equal to or better than any other compiler scheme or set of programmer annotations. Results from our benchmarks show a 44.2% reduction in runtime from using our distributed stack strategy vs. using a unified stack, and a further 11.8% reduction in runtime from using a linear optimization strategy for allocation vs. a simpler greedy strategy; both in the case of the SRAM size being 20% of the total data size. For some programs, less than 5% of data in SRAM achieves a similar speedup.

[1]  Gert Goossens,et al.  Embedded software in real-time signal processing systems: application and architecture trends , 1997 .

[2]  Sharad Malik,et al.  Power analysis and minimization techniques for embedded DSP software , 1997, IEEE Trans. Very Large Scale Integr. Syst..

[3]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[4]  Csaba Andras Moritz,et al.  FlexCache: A Framework for Flexible Compiler Generated Data Caching , 2000, Intelligent Memory Systems.

[5]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[6]  Carl von Platen,et al.  Storage allocation for embedded processors , 2001, CASES '01.

[7]  Andrew W. Appel,et al.  Optimal spilling for CISC machines with few registers , 2001, PLDI '01.

[8]  Peter Marwedel,et al.  Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).

[9]  Rainer Leupers,et al.  Software synthesis and code generation for signal processing systems , 2000 .

[10]  Nikil D. Dutt,et al.  On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems , 2000, TODE.

[11]  Steven K. Reinhardt,et al.  A fully associative software-managed cache design , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[12]  Thomas Lindgren,et al.  Allocation of Global Data Objects in On-Chip RAM , 1998 .

[13]  David A. Patterson,et al.  Computer architecture (2nd ed.): a quantitative approach , 1996 .

[14]  Rajeev Barua,et al.  Compiler Support for Scalable and Efficient Memory Systems , 2001, IEEE Trans. Computers.

[15]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .