Adaptive Scratch Pad Memory Management for Dynamic Behavior of Multimedia Applications

Exploiting runtime memory access traces can be a complementary approach to compiler optimizations for the energy reduction in memory hierarchy. This is particularly important for emerging multimedia applications since they usually have input-sensitive runtime behavior which results in dynamic and/or irregular memory access patterns. These types of applications are normally hard to optimize by static compiler optimizations. The reason is that their behavior stays unknown until runtime and may even change during computation. To tackle this problem, we propose an integrated approach of software [compiler and operating system (OS)] and hardware (data access record table) techniques to exploit data reusability of multimedia applications in Multiprocessor Systems on Chip. Guided by compiler analysis for generating scratch pad data layouts and hardware components for tracking dynamic memory accesses, the scratch pad data layout adapts to an input data pattern with the help of a runtime scratch pad memory manager incorporated in the OS. The runtime data placement strategy presented in this paper provides efficient scratch pad utilization for the dynamic applications. The goal is to minimize the amount of accesses to the main memory over the entire runtime of the system, which leads to a reduction in the energy consumption of the system. Our experimental results show that our approach is able to significantly improve the energy consumption of multimedia applications with dynamic memory access behavior over an existing compiler technique and an alternative hardware technique.

[1]  Erik Brockmeyer,et al.  Multiprocessor system-on-chip data reuse analysis for exploring customized memory hierarchies , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[2]  Harry Berryman,et al.  Run-Time Scheduling and Execution of Loops on Message Passing Machines , 1990, J. Parallel Distributed Comput..

[3]  Peter Marwedel,et al.  Assigning program and data objects to scratchpad for energy reduction , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[4]  Nikil D. Dutt,et al.  Efficient utilization of scratch-pad memory in embedded processor applications , 1997, Proceedings European Design and Test Conference. ED & TC 97.

[5]  Erik Brockmeyer,et al.  DRDU: A data reuse analysis technique for efficient scratch-pad memory management , 2007, TODE.

[6]  Joshua B. Fryman,et al.  Software caching using dynamic binary rewriting for embedded devices , 2002, Proceedings International Conference on Parallel Processing.

[7]  Peter Marwedel,et al.  Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).

[8]  Yunheung Paek,et al.  A retargetable parallel-programming framework for MPSoC , 2008, TODE.

[9]  Mahmut T. Kandemir,et al.  Compiler-Directed Variable Latency Aware SPM Management to CopeWith Timing Problems , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[10]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach (4. ed.) , 2007 .

[11]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[12]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[13]  Guillem Bernat,et al.  pWCET: a Tool for Probabilistic Worst-Case Execution Time Analysis of Real-Time Systems , 2003 .

[14]  Luca Benini,et al.  Increasing Energy Efficiency of Embedded Systems by Application-Specific Memory Hierarchy Generation , 2000, IEEE Des. Test Comput..

[15]  Rajeev Barua,et al.  Heap data allocation to scratch-pad memory in embedded systems , 2005, J. Embed. Comput..

[16]  Mahmut Kandemir,et al.  Memory Systems and Compiler Support for MPSoC Architectures , 2005 .

[17]  Ken Kennedy,et al.  Improving cache performance in dynamic applications through data and computation reorganization at run time , 1999, PLDI '99.

[18]  Nikil D. Dutt,et al.  Data reuse driven energy-aware MPSoC co-synthesis of memory and communication architecture for streaming applications , 2006, Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06).

[19]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[20]  Francky Catthoor,et al.  Reuse analysis of indirectly indexed arrays , 2006, TODE.

[21]  Keith D. Cooper,et al.  Compiler-controlled memory , 1998, ASPLOS VIII.

[22]  Mahmut T. Kandemir,et al.  Dynamic Scratch-Pad Memory Management for Irregular Array Access Patterns , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[23]  Csaba Andras Moritz,et al.  FlexCache: A Framework for Flexible Compiler Generated Data Caching , 2000, Intelligent Memory Systems.

[24]  Steven K. Reinhardt,et al.  A fully associative software-managed cache design , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[25]  David A. Patterson,et al.  Computer architecture (2nd ed.): a quantitative approach , 1996 .

[26]  Ahmed Amine Jerraya,et al.  An optimal memory allocation for application-specific multiprocessor system-on-chip , 2001, International Symposium on System Synthesis (IEEE Cat. No.01EX526).

[27]  Sumesh Udayakumaran,et al.  Compiler-decided dynamic memory allocation for scratch-pad based embedded systems , 2003, CASES '03.

[28]  Peter Marwedel,et al.  Dynamic overlay of scratchpad memory for energy minimization , 2004, International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004..

[29]  Francky Catthoor,et al.  Compiler-Based Approach for Exploiting Scratch-Pad in Presence of Irregular Array Access , 2005, Design, Automation and Test in Europe.

[30]  Lin Gao,et al.  Memory coloring: a compiler approach for scratchpad memory management , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[31]  Norman P. Jouppi,et al.  Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .

[32]  Mahmut T. Kandemir,et al.  Dynamic management of scratch-pad memory space , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[33]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[34]  Peter Marwedel,et al.  Data partitioning for maximal scratchpad usage , 2003, ASP-DAC '03.

[35]  Carl von Platen,et al.  Storage allocation for embedded processors , 2001, CASES '01.

[36]  Luca Benini,et al.  An integrated hardware/software approach for run-time scratchpad management , 2004, Proceedings. 41st Design Automation Conference, 2004..

[37]  Naraig Manjikian Multiprocessor enhancements of the SimpleScalar tool set , 2001, CARN.

[38]  Mahmut T. Kandemir,et al.  Compiler-directed selection of dynamic memory layouts , 2001, Ninth International Symposium on Hardware/Software Codesign. CODES 2001 (IEEE Cat. No.01TH8571).

[39]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[40]  Sarit Kraus,et al.  KBFS: K-Best-First Search , 2003, Annals of Mathematics and Artificial Intelligence.

[41]  Rajeev Barua,et al.  An optimal memory allocation scheme for scratch-pad-based embedded systems , 2002, TECS.