Characterization and exploitation of narrow-width loads: the narrow-width cache approach
暂无分享,去创建一个
[1] Kenneth C. Yeager. The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.
[2] Todd M. Austin,et al. The SimpleScalar tool set, version 2.0 , 1997, CARN.
[3] Margaret Martonosi,et al. Dynamically exploiting narrow width operands to improve processor power and performance , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.
[4] Margaret Martonosi,et al. Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[5] R. Canal,et al. Very low power pipelines using significance compression , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.
[6] Krste Asanovic,et al. Dynamic zero compression for cache energy reduction , 2000, MICRO 33.
[7] Margaret Martonosi,et al. Value-based clock gating and operation packing: dynamic strategies for improving processor power and performance , 2000, TOCS.
[8] Mikko H. Lipasti,et al. Silent Stores and Store Value Locality , 2001, IEEE Trans. Computers.
[9] Brad Calder,et al. Automatically characterizing large scale program behavior , 2002, ASPLOS X.
[10] Jun Yang,et al. Frequent value locality and its applications , 2002, TECS.
[11] Jun Yang,et al. Energy efficient Frequent Value data Cache design , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..
[12] Gabriel H. Loh. Exploiting data-width locality to increase superscalar execution bandwidth , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..
[13] Glenn Reinman,et al. Just say no: benefits of early cache miss determination , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..
[14] David A. Wood,et al. Adaptive cache compression for high-performance processors , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[15] Kanad Ghose,et al. Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[16] Aneesh Aggarwal,et al. Restrictive compression techniques to increase level 1 cache capacity , 2005, 2005 International Conference on Computer Design.
[17] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[18] Kunle Olukotun,et al. Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.
[19] Mateo Valero,et al. An asymmetric clustered processor based on value content , 2005, ICS '05.
[20] M. Ekman,et al. A robust main-memory compression scheme , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[21] Gilles Pokam,et al. A case for a complexity-effective, width-partitioned microarchitecture , 2006, TACO.
[22] Shyamkumar Thoziyoor,et al. CACTI 5 . 1 , 2008 .
[23] Per Stenström,et al. Memory-Link Compression Schemes: A Value Locality Perspective , 2008, IEEE Transactions on Computers.
[24] Per Stenström,et al. Zero-Value Caches: Cancelling Loads that Return Zero , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.