Two New Techniques Integrated for Energy-Efficient TLB Design

The translation lookaside buffer (TLB) is an essential component used to speed up the virtual-to-physical address translation. Due to frequent lookup, however, the power consumption of the TLB is usually considerable. This paper presents an energy-efficient TLB design for the embedded processors. In our design, we first propose a real-time filter scheme to facilitate the block buffering to eliminate the redundant TLB accesses without comparator delay. By modifying the address registers to be sensitive to the contents variation, the proposed real-time filter can distinguish the redundant TLB access as soon as the virtual address is generated. The second technique is a banking-like design, which aims to reduce the energy consumption per TLB access in case of block buffer miss. To alleviate the performance penalty introduced by the conventional banking technique, we develop two adaptive variants of the banked TLB. Both variants can achieve the high energy efficiency as the banked TLB while maintaining the low miss ratio as the nonbanked TLB. The experimental results show that by filtering out all the redundant TLB accesses and then minimizing the energy consumption per access, without any performance penalty our design can effectively improve the Energy* Delay product of the TLB, especially for the data TLB with poor locality

[1]  Michel Cekleov,et al.  Virtual-address caches. Part 1: problems and solutions in uniprocessors , 1997, IEEE Micro.

[2]  Alvin M. Despain,et al.  Cache designs for energy efficiency , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[3]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[4]  Srilatha Manne Low Power TLB Design for High Performance Microprocessors , 1997 .

[5]  Seh-Woong Jeong,et al.  A Low Power TLB Structure for Embedded Systems , 2002, IEEE Computer Architecture Letters.

[6]  Guang R. Gao,et al.  An energy efficient TLB design methodology , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..

[7]  Tomás Lang,et al.  Reducing TLB power requirements , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.

[8]  Norman P. Jouppi,et al.  Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .

[9]  Richard T. Witek,et al.  StrongARM: a high-performance ARM processor , 1996, COMPCON '96. Technologies for the Information Superhighway Digest of Papers.

[10]  Shanq-Jang Ruan,et al.  Design and analysis of low-power cache using two-level filter scheme , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[11]  Anand Sivasubramaniam,et al.  Generating physical addresses directly for saving instruction TLB energy , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[12]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[13]  Narayanan Vijaykrishnan,et al.  Characterizing dynamic and leakage power behavior in flip-flops , 2002, 15th Annual IEEE International ASIC/SOC Conference.

[14]  Charles C. Weems,et al.  Selective block buffering TLB system for embedded processors , 2005 .

[15]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[16]  Shin-Dug Kim,et al.  Power-aware deterministic block allocation for low-power way-selective cache structure , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..