Selective block buffering TLB system for embedded processors

The authors present a translation lookaside buffer (TLB) system with low power consumption for embedded processors. The proposed TLB is constructed as multiple banks, each with an associated block buffer and a corresponding comparator. Either the block buffer or the main bank is selectively accessed on the basis of two bits in the tag buffer. Dynamic power savings are achieved by reducing the number of entries accessed in parallel, as a result of using the tag buffer as a filtering mechanism. The performance overhead of the proposed TLB is negligible compared with other hierarchical TLB structures. For example, the two-cycle overhead of the proposed TLB is only ∼1%, as compared with 5% overhead for a filter (micro)-TLB and 14% overhead for a banked-TLB with block buffering. The authors show that the average hit ratios of the block buffers and the main banks of the proposed TLB are 94% and 6%, respectively. Dynamic power is reduced by ∼93% with respect to a fully associative TLB, 87% with respect to a filter-TLB and 60% relative to a banked-TLB with block buffering. Therefore, significant power savings are achieved with only a small performance degradation.

[1]  J.J. Navarro,et al.  The Difference-Bit Cache , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[2]  K. Ghose,et al.  Analytical energy dissipation models for low power caches , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.

[3]  Anand Sivasubramaniam,et al.  Generating physical addresses directly for saving instruction TLB energy , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[4]  Y. Nakagome,et al.  Trends in low-power RAM circuit technologies , 1994, Proceedings of 1994 IEEE Symposium on Low Power Electronics.

[5]  Kanad Ghose,et al.  Energy-efficiency of VLSI caches: a comparative study , 1997, Proceedings Tenth International Conference on VLSI Design.

[6]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[7]  Norman P. Jouppi,et al.  Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .

[8]  Glenn Reinman,et al.  Just say no: benefits of early cache miss determination , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[9]  Christer Svensson,et al.  Trading speed for low power by choice of supply and threshold voltages , 1993 .

[10]  Norman P. Jouppi,et al.  An Integrated Cache Timing and Power Model , 2002 .

[11]  Bruce Jacob,et al.  Cache Design for Embedded Real-Time Systems , 1999 .

[12]  Todd M. Austin,et al.  High-Bandwidth Address Translation for Multiple-Issue Processors , 1996, ISCA.

[13]  Kanad Ghose,et al.  Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[14]  Seh-Woong Jeong,et al.  A Low Power TLB Structure for Embedded Systems , 2002, IEEE Computer Architecture Letters.

[15]  Tomás Lang,et al.  Reducing TLB power requirements , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.

[16]  Trevor N. Mudge,et al.  Virtual memory in contemporary microprocessors , 1998, IEEE Micro.

[17]  Norman P. Jouppi,et al.  WRL Research Report 93/5: An Enhanced Access and Cycle Time Model for On-chip Caches , 1994 .

[18]  Trevor Mudge,et al.  Challenges for architectural level power modeling , 2002 .

[19]  William H. Mangione-Smith,et al.  The filter cache: an energy efficient memory structure , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[20]  Srilatha Manne Low Power TLB Design for High Performance Microprocessors , 1997 .

[21]  Trevor N. Mudge,et al.  Virtual Memory: Issues of Implementation , 1998, Computer.