论文信息 - Energy efficient D-TLB and data cache using semantic-aware multilateral partitioning

Energy efficient D-TLB and data cache using semantic-aware multilateral partitioning

The memory subsystem, including address translations and cache accesses, consumes a major portion of the overall energy on a processor. In this paper, we address the memory energy issues by using a streamlined architectural partitioning technique that effectively reduces energy consumption in the memory subsystem without compromising performance. It is achieved by decoupling the d-TLB lookups and the data cache accesses, based on the semantic regions defined by programming languages and software convention, into discrete reference substreams --- stack, global static, and heap. Their unique access behaviors and locality characteristics are analyzed and exploited for power reduction. Our results show that an average of 35% energy can be reduced in the d-TLB and the data cache. Furthermore, an average of 46% energy can be saved by selectively multi-porting the semantic-aware d-TLBs and data caches against their monolithic counterparts.

Hsien-Hsin S. Lee | Chinnakrishnan S. Ballapuram

[1] Michael C. Huang,et al. L1 data cache decomposition for energy efficiency , 2001, ISLPED '01.

[2] Csaba Andras Moritz,et al. Cool-Mem: combining statically speculative memory accessing with selective address translation for energy efficiency , 2002, ASPLOS X.

[3] Tomás Lang,et al. Reducing TLB power requirements , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.

[4] Trevor Mudge,et al. MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[5] Russell P. Blake. Exploring a Stack Architecture , 1977, Computer.

[6] Alvin M. Despain,et al. Cache design trade-offs for power and performance optimization: a case study , 1995, ISLPED '95.

[7] Dirk Grunwald,et al. A stateless, content-directed data prefetching mechanism , 2002, ASPLOS X.

[8] Cache Decomposition for Energy Efficient Processors , 2003 .

[9] David H. Albonesi,et al. Selective cache ways: on-demand cache resource allocation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[10] Margaret Martonosi,et al. Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[11] Norman P. Jouppi,et al. A simulation based study of TLB performance , 1992, ISCA '92.

[12] Gary S. Tyson,et al. Region-based caching: an energy-delay efficient memory architecture for embedded processors , 2000, CASES '00.

[13] Brad Calder,et al. Pointer cache assisted prefetching , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[14] Seh-Woong Jeong,et al. A Low Power TLB Structure for Embedded Systems , 2002, IEEE Computer Architecture Letters.

[15] Richard T. Witek,et al. A 160 MHz 32 b 0.5 W CMOS RISC microprocessor , 1996, 1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC.

[16] Sangyeun Cho,et al. Decoupling local variable accesses in a wide-issue superscalar processor , 1999, ISCA.

[17] Kanad Ghose,et al. Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[18] Hsien-Hsin S. Lee,et al. The Elusive Metric for Low-Power Architecture Research , 2003 .

[19] William H. Mangione-Smith,et al. Filtering Memory References to Increase Energy Efficiency , 2000, IEEE Trans. Computers.

[20] Mahmut T. Kandemir,et al. Generating physical addresses directly for saving instruction TLB energy , 2002, MICRO.