论文信息 - Characterization and exploitation of narrow-width loads: the narrow-width cache approach

Characterization and exploitation of narrow-width loads: the narrow-width cache approach

This paper exploits small-value locality to accelerate the execution of memory instructions. We find that narrow-width loads (NWLDs) --- loads with small-value operands of 8 bits or less --- comprise 26% of all executed loads across 40 applications of the SPEC benchmark suites. We establish that the frequency of NWLDs are almost independent of compiler and input data. We introduce narrow-width caches (NWC) to cache small-value memory words. NWCs provide a significant speedup for several memory-intensive applications with a negligible chip-area overhead. NWCs also reduce the overall energy dissipation and memory traffic.

Per Stenström | Mafijul Md. Islam | P. Stenström | M. Islam

[1] Kenneth C. Yeager. The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.

[2] Todd M. Austin,et al. The SimpleScalar tool set, version 2.0 , 1997, CARN.

[3] Margaret Martonosi,et al. Dynamically exploiting narrow width operands to improve processor power and performance , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[4] Margaret Martonosi,et al. Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[5] R. Canal,et al. Very low power pipelines using significance compression , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[6] Krste Asanovic,et al. Dynamic zero compression for cache energy reduction , 2000, MICRO 33.

[7] Margaret Martonosi,et al. Value-based clock gating and operation packing: dynamic strategies for improving processor power and performance , 2000, TOCS.

[8] Mikko H. Lipasti,et al. Silent Stores and Store Value Locality , 2001, IEEE Trans. Computers.

[9] Brad Calder,et al. Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[10] Jun Yang,et al. Frequent value locality and its applications , 2002, TECS.

[11] Jun Yang,et al. Energy efficient Frequent Value data Cache design , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[12] Gabriel H. Loh. Exploiting data-width locality to increase superscalar execution bandwidth , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[13] Glenn Reinman,et al. Just say no: benefits of early cache miss determination , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[14] David A. Wood,et al. Adaptive cache compression for high-performance processors , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[15] Kanad Ghose,et al. Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[16] Aneesh Aggarwal,et al. Restrictive compression techniques to increase level 1 cache capacity , 2005, 2005 International Conference on Computer Design.

[17] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[18] Kunle Olukotun,et al. Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[19] Mateo Valero,et al. An asymmetric clustered processor based on value content , 2005, ICS '05.

[20] M. Ekman,et al. A robust main-memory compression scheme , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[21] Gilles Pokam,et al. A case for a complexity-effective, width-partitioned microarchitecture , 2006, TACO.

[22] Shyamkumar Thoziyoor,et al. CACTI 5 . 1 , 2008 .

[23] Per Stenström,et al. Memory-Link Compression Schemes: A Value Locality Perspective , 2008, IEEE Transactions on Computers.

[24] Per Stenström,et al. Zero-Value Caches: Cancelling Loads that Return Zero , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.