论文信息 - Adaptive data compression for high-performance low-power on-chip networks

Adaptive data compression for high-performance low-power on-chip networks

With the recent design shift towards increasing the number of processing elements in a chip, high-bandwidth support in on-chip interconnect is essential for low-latency communication. Much of the previous work has focused on router architectures and network topologies using wide/long channels. However, such solutions may result in a complicated router design and a high interconnect cost. In this paper, we exploit a table-based data compression technique, relying on value patterns in cache traffic. Compressing a large packet into a small one can increase the effective bandwidth of routers and links, while saving power due to reduced operations. The main challenges are providing a scalable implementation of tables and minimizing overhead of the compression latency. First, we propose a shared table scheme that needs one encoding and one decoding tables for each processing element, and a management protocol that does not require in-order delivery. Next, we present streamlined encoding that combines flit injection and encoding in a pipeline. Furthermore, data compression can be selectively applied to communication on congested paths only if compression improves performance. Simulation results in a 16-core CMP show that our compression method improves the packet latency by up to 44% with an average of 36% and reduces the network power consumption by 36% on average.

[1] H. B. Bakoglu,et al. Circuits, interconnections, and packaging for VLSI , 1990 .

[2] Larry Rudolph,et al. Creating a wider bus using caching techniques , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[3] Mikko H. Lipasti,et al. Value locality and load value prediction , 1996, ASPLOS VII.

[4] Anantha Chandrakasan,et al. Bus energy minimization by transition pattern coding (TPC) in deep sub-micron technologies , 2000, IEEE/ACM International Conference on Computer Aided Design. ICCAD - 2000. IEEE/ACM Digest of Technical Papers (Cat. No.00CH37140).

[5] Jun Yang,et al. Frequent value locality and value-centric data cache design , 2000, ASPLOS IX.

[6] Michael Zhang,et al. Highly-Associative Caches for Low-Power Processors , 2000 .

[7] W. Dally,et al. Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[8] William J. Dally,et al. A delay model and speculative architecture for pipelined routers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[9] Orion: a power-performance simulator for interconnection networks , 2002, MICRO 35.

[10] Fredrik Larsson,et al. Simics: A Full System Simulation Platform , 2002, Computer.

[11] Jörg Henkel,et al. A dictionary-based en/decoding scheme for low-power data buses , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[12] Jaehyuk Huh,et al. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture , 2003, ISCA '03.

[13] Anant Agarwal,et al. Scalar operand networks: on-chip interconnect for ILP in partitioned architectures , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[14] T. N. Vijaykumar,et al. Exploring high bandwidth pipelined cache architecture for scaled technology , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[15] John Kubiatowicz,et al. Exploiting prediction to reduce power on buses , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[16] K. Banerjee,et al. A global interconnect optimization scheme for nanometer scale VLSI with implications for latency, bandwidth, and power dissipation , 2004, IEEE Transactions on Electron Devices.

[17] William J. Dally,et al. Principles and Practices of Interconnection Networks , 2004 .

[18] David A. Wood,et al. Adaptive cache compression for high-performance processors , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[19] David A. Wood,et al. Managing Wire Delay in Large Chip-Multiprocessor Caches , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[20] Milo M. K. Martin,et al. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[21] Steven K. Reinhardt,et al. A unified compressed memory hierarchy , 2005, 11th International Symposium on High-Performance Computer Architecture.

[22] Dean M. Tullsen,et al. Interconnections in multi-core architectures: understanding mechanisms, overheads and scaling , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[23] Karthik Ramani,et al. Interconnect-Aware Coherence Protocols for Chip Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[24] Alyssa B. Apsel,et al. Leveraging Optical Technology in Future Bus-based Chip Multiprocessors , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[25] Henry Hoffmann,et al. On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.

[26] Niraj K. Jha,et al. Express virtual channels: towards the ideal interconnection fabric , 2007, ISCA '07.

[27] Valentin Puente,et al. Rotary router: an efficient architecture for CMP interconnection networks , 2007, ISCA '07.

[28] William J. Dally,et al. Flattened Butterfly Topology for On-Chip Networks , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[29] D. Jayasimha,et al. On-Chip Interconnection Networks : Why They are Different and How to Compare Them , 2007 .

[30] Sriram R. Vangal,et al. A 5-GHz Mesh Interconnect for a Teraflops Processor , 2007, IEEE Micro.

[31] Per Stenström,et al. Memory-Link Compression Schemes: A Value Locality Perspective , 2008, IEEE Transactions on Computers.

[32] Chita R. Das,et al. Performance and power optimization through data compression in Network-on-Chip architectures , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.