Dynamic Tag Reduction for Low-Power Caches in Embedded Systems with Virtual Memory

This paper presents a low-power tag organization for physically tagged caches in embedded processors with virtual memory support. An exceedingly small subset of tag bits is identified for each application hot-spot so that only these tag bits are used for cache access with no performance sacrifice as they provide complete address resolution. The minimal subset of physical tag bits is dynamically updated following the changes in the physical address space of the application. Operating system support is introduced in order to maintain the reduced tags during program execution. Efficient algorithms are incorporated within the memory allocator and the dynamic linker in order to achieve dynamic update of the reduced tags. The only hardware support needed within the I/D-caches is the support for disabling bitlines of the tag arrays. An extensive set of experimental results demonstrates the efficacy of the proposed approach.

[1]  Richard T. Witek,et al.  A 160 MHz 32 b 0.5 W CMOS RISC microprocessor , 1996, 1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC.

[2]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[3]  Mahmut T. Kandemir,et al.  A compiler-based approach for dynamically managing scratch-pad memories in embedded systems , 2004, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[4]  Ruben W. Castelino,et al.  Internal Organization of the Alpha 21164, a 300-MHz 64-bit Quad-issue CMOS RISC Microprocessor , 1995, Digit. Tech. J..

[5]  David H. Albonesi,et al.  Selective cache ways: on-demand cache resource allocation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[6]  R. E. Kessler,et al.  Inexpensive implementations of set-associativity , 1989, ISCA '89.

[7]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[8]  Peter Petrov,et al.  Power efficient embedded processor IPs through application-specific tag compression in data caches , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[9]  Mahmut T. Kandemir,et al.  Compiler-directed code restructuring for reducing data TLB energy , 2004, International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004..

[10]  Rajeev Barua,et al.  MTSS: multi task stack sharing for embedded systems , 2005, CASES '05.

[11]  Michel Cekleov,et al.  Virtual-address caches. Part 1: problems and solutions in uniprocessors , 1997, IEEE Micro.

[12]  Ikuya Kawasaki,et al.  SH3: high code density, low power , 1995, IEEE Micro.

[13]  Luca Benini,et al.  Synthesis of application-specific memories for power optimization in embedded systems , 2000, Proceedings 37th Design Automation Conference.

[14]  Kazuaki Murakami,et al.  Way-predicting set-associative cache for high performance and low energy consumption , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[15]  Michel Dubois,et al.  VIRTUAL-ADDRESS CACHES , 1997 .

[16]  Tony Givargis,et al.  Zero cost indexing for improved processor cache performance , 2006, TODE.

[17]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[18]  Emmett Witchel The Span Cache: Software Controlled Tag Checks and Cache Line Size , 2001 .

[19]  Peter Petrov,et al.  Tag compression for low power in dynamically customizable embedded processors , 2004, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[20]  Ibrahim N. Hajj,et al.  An analytical, transistor-level energy model for SRAM-based caches , 1999, ISCAS'99. Proceedings of the 1999 IEEE International Symposium on Circuits and Systems VLSI (Cat. No.99CH36349).

[21]  K. Skadron,et al.  Odd/Even bus invert with two-phase transfer for buses with coupling , 2002, Proceedings of the International Symposium on Low Power Electronics and Design.

[22]  Michael C. Huang,et al.  Branch prediction on demand: an energy-efficient solution , 2003, ISLPED '03.

[23]  L. Geppert,et al.  Transmeta's magic show [microprocessor chips] , 2000 .

[24]  Steve Furber ARM System-on-Chip Architecture , 2000 .

[25]  Sang Lyul Min,et al.  U-cache: a cost-effective solution to synonym problem , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[26]  Radu Marculescu,et al.  Communication-Aware Task Scheduling and Voltage Selection for Total Systems Energy Minimization , 2003, ICCAD 2003.

[27]  Klara Nahrstedt,et al.  Energy-efficient soft real-time CPU scheduling for mobile multimedia systems , 2003, SOSP '03.

[28]  Per Stenström,et al.  TLB and snoop energy-reduction using virtual caches in low-power chip-multiprocessors , 2002, ISLPED '02.