Data reuse analysis technique for software-controlled memory hierarchies

In multimedia and other streaming applications a significant portion of energy is spent on data transfers. Exploiting data reuse opportunities in the application, we can reduce this energy by making copies of frequently used data in a small local memory and replacing speed and power inefficient transfers from main off-chip memory by more efficient local data transfers. In this paper we present an automated approach for analyzing these opportunities in a program that allows modification of the program to use custom scratch pad memory configurations comprising a hierarchical set of buffers for local storage of frequently reused data. Using our approach we are able to reduce energy consumption of the memory subsystem when using a scratch pad memory by a factor of two on average compare to a cache of the same size.

[1]  Mahmut T. Kandemir,et al.  Dynamic management of scratch-pad memory space , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[2]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[3]  Rudy Lauwereins,et al.  Search space definition and exploration for nonuniform data reuse opportunities in data-dominant applications , 2003, TODE.

[4]  William Pugh,et al.  The Omega Library interface guide , 1995 .

[5]  Doran Wilde,et al.  A LIBRARY FOR DOING POLYHEDRAL OPERATIONS , 2000 .

[6]  Erik Brockmeyer,et al.  Layer assignment techniques for low energy in multi-layered memory organisations , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[7]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[8]  Hugo De Man,et al.  Formalized methodology for data reuse exploration in hierarchical memory mappings , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.

[9]  Peter Marwedel,et al.  Assigning program and data objects to scratchpad for energy reduction , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[10]  Rudy Lauwereins,et al.  Data reuse exploration techniques for loop-dominated applications , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[11]  Norman P. Jouppi,et al.  Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .

[12]  Hugo De Man,et al.  Cache conscious data layout organization for embedded multimedia applications , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[13]  Mahmut T. Kandemir,et al.  Compiler-directed scratch pad memory hierarchy design and management , 2002, DAC '02.

[14]  D. Verkest,et al.  Systematic high-level address code transformations for piece-wise linear indexing: illustration on a medical imaging algorithm , 2000, 2000 IEEE Workshop on SiGNAL PROCESSING SYSTEMS. SiPS 2000. Design and Implementation (Cat. No.00TH8528).

[15]  Nikil D. Dutt,et al.  Efficient utilization of scratch-pad memory in embedded processor applications , 1997, Proceedings European Design and Test Conference. ED & TC 97.

[16]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[17]  Peter Marwedel,et al.  Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).