Leveraging Unused Cache Block Words to Reduce Power in CMP Interconnect

Power is of paramount importance in modern computer system design. In particular, the cache interconnect in future CMP designs is projected to consume up to half of the system power for cache fills and spills. Despite the power consumed by spills and fills, a significant percentage of each cache line is unused prior to eviction from the cache. If unused cache block words can be identified, this information can be used to improve CMP interconnect power and energy consumption. We propose a new method of CMP interconnect packet composition, leveraging unused data to reduce power. These methods are well suited to interconnection networks with high-bandwidth wires, and do not require expensive multi-ported memory systems. Assuming perfect prediction, our technique achieves an average of 37% savings in total dynamic link power consumption. With our current best prediction mechanism, our techniques reduce dynamic power consumption by 23% on average.

[1]  Jeffrey B. Rothman,et al.  Sector cache design and performance , 2000, Proceedings 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (Cat. No.PR00728).

[2]  Aneesh Aggarwal,et al.  Cache Noise Prediction , 2008, IEEE Transactions on Computers.

[3]  Ki Hwan Yum,et al.  Adaptive data compression for high-performance low-power on-chip networks , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[4]  Uri C. Weiser,et al.  Interconnect-power dissipation in a microprocessor , 2004, SLIP '04.

[5]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[6]  Anant Agarwal,et al.  Scalar operand networks: on-chip interconnect for ILP in partitioned architectures , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[7]  Sriram R. Vangal,et al.  A 5-GHz Mesh Interconnect for a Teraflops Processor , 2007, IEEE Micro.

[8]  Doug Burger,et al.  An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.

[9]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.

[10]  Lizy Kurian John,et al.  ESKIMO - energy savings using semantic knowledge of inconsequential memory occupancy for DRAM subsystem , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).