Generating physical addresses directly for saving instruction TLB energy

Power consumption and power density for the Translation Lookaside Buffer (TLB) are important considerations not only in its design, but can have a consequence on cache design as well. This paper embarks on a new philosophy for reducing the number of accesses to the instruction TLB (iTLB) for power and performance optimizations. The overall idea is to keep a translation currently being used in a register and avoid going to the iTLB as far as possible --- until there is a page change. We propose four different approaches for achieving this, and experimentally demonstrate that one of these schemes that uses a combination of compiler and hardware enhancements can reduce iTLB dynamic power by over 85% in most cases.These mechanisms can work with different instructioncache (iL1) lookup mechanisms and achieve significant iTLB power savings without compromising on performance. Their importance grows with higher iL1 miss rates and larger page sizes. They can work very well with large iTLB structures, that can possibly consume more power and take longer to lookup, without the iTLB getting into the common case. Further, we also experimentally demonstrate that they can provide performance savings for virtually-indexed, virtually-tagged iL1 caches, and can even make physically indexed, physically-tagged iL1 caches a possible choice for implementation.

[1]  Mahmut T. Kandemir,et al.  Energy-driven integrated hardware-software optimizations using SimplePower , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[2]  William D. Strecker,et al.  VAX-11/780 - A virtual address extension to the DEC PDP-11 family , 1899, AFIPS National Computer Conference.

[3]  David J. DeWitt,et al.  DBMSs on a Modern Processor: Where Does Time Go? , 1999, VLDB.

[4]  Michel Cekleov,et al.  Virtual-address caches. Part 1: problems and solutions in uniprocessors , 1997, IEEE Micro.

[5]  James E. Smith,et al.  Early-Stage Definition of LPX: A Low Power Issue-Execute Processor , 2002, PACS.

[6]  Kazuaki Murakami,et al.  Way-predicting set-associative cache for high performance and low energy consumption , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[7]  Kanad Ghose,et al.  Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[8]  Antonio González,et al.  Energy-effective issue logic , 2001, ISCA 2001.

[9]  Trevor N. Mudge,et al.  Uniprocessor Virtual Memory without TLBs , 2001, IEEE Trans. Computers.

[10]  Kevin Skadron,et al.  Power issues related to branch prediction , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[11]  Tomás Lang,et al.  Reducing TLB power requirements , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.

[12]  R. Balasubramonian,et al.  Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[13]  Norman P. Jouppi,et al.  CACTI 2.0: An Integrated Cache Timing and Power Model , 2002 .

[14]  Doug Burger,et al.  Evaluating Future Microprocessors: the SimpleScalar Tool Set , 1996 .

[15]  Anand Sivasubramaniam,et al.  Generating physical addresses directly for saving instruction TLB energy , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[16]  Randy H. Katz,et al.  Eliminating the address translation bottleneck for physical address cache , 1992, ASPLOS V.

[17]  Margaret Martonosi,et al.  Dynamic thermal management for high-performance microprocessors , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[18]  Jun Yang,et al.  Frequent value compression in data caches , 2000, MICRO 33.

[19]  Seh-Woong Jeong,et al.  A Low Power TLB Structure for Embedded Systems , 2002, IEEE Computer Architecture Letters.

[20]  Péter Kacsuk,et al.  Advanced computer architectures - a design space approach , 1997, International computer science series.

[21]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).