DRDU: A data reuse analysis technique for efficient scratch-pad memory management

In multimedia and other streaming applications, a significant portion of energy is spent on data transfers. Exploiting data reuse opportunities in the application, we can reduce this energy by making copies of frequently used data in a small local memory and replacing speed- and power-inefficient transfers from main off-chip memory by more efficient local data transfers. In this article we present an automated approach for analyzing these opportunities in a program that allows modification of the program to use custom scratch-pad memory configurations comprising a hierarchical set of buffers for local storage of frequently reused data. Using our approach we are able to both reduce energy consumption of the memory subsystem when using a scratch-pad memory by about a factor of two, on average, and improve memory system performance compared to a cache of the same size.

[1]  Nikil D. Dutt,et al.  Efficient utilization of scratch-pad memory in embedded processor applications , 1997, Proceedings European Design and Test Conference. ED & TC 97.

[2]  Mahmut T. Kandemir,et al.  Compiler-directed scratch pad memory hierarchy design and management , 2002, DAC '02.

[3]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[4]  Peter Marwedel,et al.  Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).

[5]  Mahmut T. Kandemir,et al.  Dynamic management of scratch-pad memory space , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[6]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[7]  Peter Marwedel,et al.  Data partitioning for maximal scratchpad usage , 2003, ASP-DAC '03.

[8]  Carl von Platen,et al.  Storage allocation for embedded processors , 2001, CASES '01.

[9]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[10]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[11]  Rudy Lauwereins,et al.  Search space definition and exploration for nonuniform data reuse opportunities in data-dominant applications , 2003, TODE.

[12]  Hugo De Man,et al.  Cache conscious data layout organization for embedded multimedia applications , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[13]  Sumesh Udayakumaran,et al.  Compiler-decided dynamic memory allocation for scratch-pad based embedded systems , 2003, CASES '03.

[14]  M. Fischer,et al.  SUPER-EXPONENTIAL COMPLEXITY OF PRESBURGER ARITHMETIC , 1974 .

[15]  Peter Marwedel,et al.  Dynamic overlay of scratchpad memory for energy minimization , 2004, International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004..

[16]  Doran Wilde,et al.  A LIBRARY FOR DOING POLYHEDRAL OPERATIONS , 2000 .

[17]  Heiko Falk,et al.  Control Flow Driven Splitting of Loop Nests at the Source Code Level , 2003, DATE.

[18]  Peter Marwedel,et al.  Assigning program and data objects to scratchpad for energy reduction , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[19]  Keith D. Cooper,et al.  Compiler-controlled memory , 1998, ASPLOS VIII.

[20]  Mahmut T. Kandemir,et al.  Data compression for improving SPM behavior , 2004, Proceedings. 41st Design Automation Conference, 2004..

[21]  Rudy Lauwereins,et al.  Data reuse exploration techniques for loop-dominated applications , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[22]  Erik Brockmeyer,et al.  Data reuse analysis technique for software-controlled memory hierarchies , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[23]  Nikil D. Dutt,et al.  FORAY-GEN: automatic generation of affine functions for memory optimizations , 2005, Design, Automation and Test in Europe.

[24]  Hugo De Man,et al.  Formalized methodology for data reuse exploration in hierarchical memory mappings , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.

[25]  William Pugh,et al.  The Omega Library interface guide , 1995 .

[26]  Erik Brockmeyer,et al.  Layer assignment techniques for low energy in multi-layered memory organisations , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[27]  Rajeev Barua,et al.  An optimal memory allocation scheme for scratch-pad-based embedded systems , 2002, TECS.

[28]  D. Verkest,et al.  Systematic high-level address code transformations for piece-wise linear indexing: illustration on a medical imaging algorithm , 2000, 2000 IEEE Workshop on SiGNAL PROCESSING SYSTEMS. SiPS 2000. Design and Implementation (Cat. No.00TH8528).

[29]  Norman P. Jouppi,et al.  Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .