Dynamic Scratch-Pad Memory Management for Irregular Array Access Patterns

There exist many embedded applications such as those executing on set-top boxes, wireless base stations, HDTV, and mobile handsets that are structured as nested loops and benefit significantly from software managed memory. Prior work on scratchpad memories (SPMs) focused primarily on applications with regular data access patterns. Unfortunately, some embedded applications do not fit in this category and consequently conventional SPM management schemes will fail to produce the best results for them. In this work, we propose a novel compilation strategy for data SPMs for embedded applications that exhibit irregular data access patterns. Our scheme divides the task of optimization between compiler and runtime. The compiler processes each loop nest and inserts code to collect information at runtime. Then, the code is modified in such a fashion that, depending on the collected information, it dynamically chooses to use or not to use the data SPM for a given set of accesses to irregular arrays. Our results indicate that this approach is very successful with the applications that have irregular patterns and improves their execution cycles by about 54% over a state-of-the-art SPM management technique and 23% over the conventional cache memories. Also, the additional code size overhead incurred by our approach is less than 5% for all the applications tested

[1]  Nikil D. Dutt,et al.  Efficient utilization of scratch-pad memory in embedded processor applications , 1997, Proceedings European Design and Test Conference. ED & TC 97.

[2]  Monica S. Lam,et al.  An Overview of a Compiler for Scalable Parallel Machines , 1993, LCPC.

[3]  Michael Wolfe,et al.  Parallelizing compilers , 1996, CSUR.

[4]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[5]  Todd Austin,et al.  A Hacker’s Guide to the SimpleScalar Architectural Research Tool Set , 1996 .

[6]  Peter Marwedel,et al.  Assigning program and data objects to scratchpad for energy reduction , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[7]  Francky Catthoor,et al.  Compiler-Based Approach for Exploiting Scratch-Pad in Presence of Irregular Array Access , 2005, Design, Automation and Test in Europe.

[8]  Mahmut T. Kandemir,et al.  Dynamic management of scratch-pad memory space , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[9]  Steven K. Reinhardt,et al.  A fully associative software-managed cache design , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[10]  Luca Benini,et al.  Increasing Energy Efficiency of Embedded Systems by Application-Specific Memory Hierarchy Generation , 2000, IEEE Des. Test Comput..

[11]  Wen-mei W. Hwu,et al.  Enhancing loop buffering of media and telecommunications applications using low-overhead predication , 2001, MICRO.

[12]  John Arends,et al.  Instruction fetch energy reduction using loop caches for embedded applications with small tight loops , 1999, ISLPED '99.

[13]  Wen-mei W. Hwu,et al.  Enhancing loop buffering of media and telecommunications applications using low-overhead predication , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[14]  Luca Benini,et al.  An integrated hardware/software approach for run-time scratchpad management , 2004, Proceedings. 41st Design Automation Conference, 2004..

[15]  Ibrahim N. Hajj,et al.  Energy and performance improvements in microprocessor design using a loop cache , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[16]  Santosh Pande,et al.  Optimizing On-Chip Memory Usage Through Loop Restructuring for Embedded Processors , 2000 .

[17]  Saman Amarasinghe,et al.  The suif compiler for scalable parallel machines , 1995 .