A reconfigurable cache memory with heterogeneous banks

The optimal size of a large on-chip cache can be different for different programs: at some point, the reduction of cache misses achieved when increasing cache size hits diminishing returns, while the higher cache latency hurts performance. This paper presents the Amorphous Cache (AC), a reconfigurable L2 on-chip cache aimed at improving performance as well as reducing energy consumption. AC is composed of heterogeneous sub-caches as opposed to common caches using homogenous sub-caches. The sub-caches are turned off depending on the application workload to conserve power and minimize latencies. A novel reconfiguration algorithm based on Basic Block Vectors is proposed to recognize program phases, and a learning mechanism is used to select the appropriate cache configuration for each program phase. We compare our reconfigurable cache with existing proposals of adaptive and non-adaptive caches. Our results show that the combination of AC and the novel reconfiguration algorithm provides the best power consumption and performance. For example, on average, it reduces the cache access latency by 55.8%, the cache dynamic energy by 46.5%, and the cache leakage power by 49.3% with respect to a non-adaptive cache.

[1]  David H. Albonesi,et al.  Selective cache ways: on-demand cache resource allocation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[2]  James E. Smith,et al.  Comparing program phase detection techniques , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[3]  藤井 透 First World Congress on Computational Mechanics 出席して , 1987 .

[4]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[5]  Jaume Abella,et al.  Heterogeneous way-size cache , 2006, ICS '06.

[6]  Doug Burger,et al.  An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.

[7]  S. Tam,et al.  A 65-nm Dual-Core Multithreaded Xeon® Processor With 16-MB L3 Cache , 2007, IEEE Journal of Solid-State Circuits.

[8]  Vikas Agarwal,et al.  Static energy reduction techniques for microprocessor caches , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.

[9]  Babak Falsafi,et al.  Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[11]  Norman P. Jouppi,et al.  Architecting Efficient Interconnects for Large Caches with CACTI 6.0 , 2008, IEEE Micro.

[12]  Frank Vahid,et al.  A highly configurable cache architecture for embedded systems , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[13]  Bharadwaj S. Amrutur,et al.  Molecular Caches: A caching structure for dynamic creation of application-specific Heterogeneous cache regions , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[14]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[15]  Norman P. Jouppi,et al.  Reconfigurable caches and their application to media processing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[16]  Alon Naveh,et al.  Power and Thermal Management in the Intel Core Duo Processor , 2006 .

[17]  Babak Falsafi,et al.  Database Servers on Chip Multiprocessors: Limitations and Opportunities , 2007, CIDR.

[18]  Rajeev Balasubramonian,et al.  A Dynamically Tunable Memory Hierarchy , 2003, IEEE Trans. Computers.

[19]  Bart Preneel,et al.  Hash functions , 2005, Encyclopedia of Cryptography and Security.