A post-compiler approach to scratchpad mapping of code

ScratchPad Memories (SPMs) are commonly used in embedded systems because they are more energy-efficient than caches and enable tighter application control on the memory hierarchy. Optimally mapping code and data to SPMs is, however, still a challenge. This paper proposes an optimal scratchpad mapping approach for code segments, which has the distinctive characteristic of working directly on application binaries, thus requiring no access to either the compiler or the application source code - a clear advantage for legacy or proprietary, IP-protected applications.The mapping problem is solved by means of a Dynamic Programming algorithm applied to the execution traces of the target application. The algorithm is able to find the optimal set of instructions blocks to be moved into a dedicated SPM, either minimizing energy consumption or execution times. A patching tool, which can use the output of the optimal mapper, modifies the binary of the application and moves the relevant portions of its code segments to memory locations inside of the SPM.

[1]  Luca Benini,et al.  An integrated hardware/software approach for run-time scratchpad management , 2004, Proceedings. 41st Design Automation Conference, 2004..

[2]  Oscar H. Ibarra,et al.  Fast Approximation Algorithms for the Knapsack and Sum of Subset Problems , 1975, JACM.

[3]  Nikil D. Dutt,et al.  Local memory exploration and optimization in embedded systems , 1999, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[4]  Nikil D. Dutt,et al.  Efficient utilization of scratch-pad memory in embedded processor applications , 1997, Proceedings European Design and Test Conference. ED & TC 97.

[5]  Mahmut T. Kandemir,et al.  Exploiting shared scratch pad memory space in embedded multiprocessor systems , 2002, DAC '02.

[6]  Peter Marwedel,et al.  Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).

[7]  Mahmut T. Kandemir,et al.  Compiler-directed scratch pad memory hierarchy design and management , 2002, DAC '02.

[8]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[9]  David Seal,et al.  ARM Architecture Reference Manual , 2001 .

[10]  Sumesh Udayakumaran,et al.  Compiler-decided dynamic memory allocation for scratch-pad based embedded systems , 2003, CASES '03.

[11]  Masato Nagamatsu,et al.  A microprocessor with a 128-bit CPU, ten floating-point MAC's, four floating-point dividers, and an MPEG-2 decoder , 1999, IEEE J. Solid State Circuits.

[12]  Luca Benini,et al.  Polynomial-time algorithm for on-chip scratchpad memory partitioning , 2003, CASES '03.

[13]  Erik Brockmeyer,et al.  Data Memory Organization and Optimizations in Application-Specific Systems , 2001, IEEE Des. Test Comput..

[14]  Luca Benini,et al.  Performance Analysis of Arbitration Policies for SoC Communication Architectures , 2003, Des. Autom. Embed. Syst..

[15]  Hans Kellerer,et al.  Knapsack problems , 2004 .

[16]  Keisuke Inoue,et al.  A 250-MHz single-chip multiprocessor for audio and video signal processing , 2001 .

[17]  Sartaj Sahni,et al.  Approximate Algorithms for the 0/1 Knapsack Problem , 1975, JACM.

[18]  Peter Marwedel,et al.  Cache-aware scratchpad allocation algorithm , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[19]  Mahmut T. Kandemir,et al.  Dynamic management of scratch-pad memory space , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[20]  Peter Marwedel,et al.  Data partitioning for maximal scratchpad usage , 2003, ASP-DAC '03.

[21]  Mahmut T. Kandemir,et al.  Exploiting scratch-pad memory using Presburger formulas , 2001, International Symposium on System Synthesis (IEEE Cat. No.01EX526).

[22]  M. Kandemir,et al.  Power-aware partitioned cache architectures , 2001, ISLPED'01: Proceedings of the 2001 International Symposium on Low Power Electronics and Design (IEEE Cat. No.01TH8581).

[23]  Peter Marwedel,et al.  Assigning program and data objects to scratchpad for energy reduction , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[24]  Peter Marwedel,et al.  Reducing energy consumption by dynamic copying of instructions onto onchip memory , 2002, 15th International Symposium on System Synthesis, 2002..

[25]  Enrico Macii,et al.  Improving the efficiency of memory partitioning by address clustering , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[26]  Luca Benini,et al.  Layout-driven memory synthesis for embedded systems-on-chip , 2002, IEEE Trans. Very Large Scale Integr. Syst..

[27]  Luca Benini,et al.  Increasing Energy Efficiency of Embedded Systems by Application-Specific Memory Hierarchy Generation , 2000, IEEE Des. Test Comput..

[28]  Luca Benini,et al.  Analyzing on-chip communication in a MPSoC environment , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[29]  Chaitali Chakrabarti,et al.  Memory exploration for low power embedded systems , 1999, ISCAS'99. Proceedings of the 1999 IEEE International Symposium on Circuits and Systems VLSI (Cat. No.99CH36349).

[30]  Luca Benini,et al.  Legacy SystemC co-simulation of multi-processor systems-on-chip , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[31]  T. Sakamoto,et al.  A high bandwidth superscalar microprocessor for multimedia applications , 1999, 1999 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC. First Edition (Cat. No.99CH36278).