Adaptive data compression for high-performance low-power on-chip networks

With the recent design shift towards increasing the number of processing elements in a chip, high-bandwidth support in on-chip interconnect is essential for low-latency communication. Much of the previous work has focused on router architectures and network topologies using wide/long channels. However, such solutions may result in a complicated router design and a high interconnect cost. In this paper, we exploit a table-based data compression technique, relying on value patterns in cache traffic. Compressing a large packet into a small one can increase the effective bandwidth of routers and links, while saving power due to reduced operations. The main challenges are providing a scalable implementation of tables and minimizing overhead of the compression latency. First, we propose a shared table scheme that needs one encoding and one decoding tables for each processing element, and a management protocol that does not require in-order delivery. Next, we present streamlined encoding that combines flit injection and encoding in a pipeline. Furthermore, data compression can be selectively applied to communication on congested paths only if compression improves performance. Simulation results in a 16-core CMP show that our compression method improves the packet latency by up to 44% with an average of 36% and reduces the network power consumption by 36% on average.

[1]  H. B. Bakoglu,et al.  Circuits, interconnections, and packaging for VLSI , 1990 .

[2]  Larry Rudolph,et al.  Creating a wider bus using caching techniques , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[3]  Mikko H. Lipasti,et al.  Value locality and load value prediction , 1996, ASPLOS VII.

[4]  Anantha Chandrakasan,et al.  Bus energy minimization by transition pattern coding (TPC) in deep sub-micron technologies , 2000, IEEE/ACM International Conference on Computer Aided Design. ICCAD - 2000. IEEE/ACM Digest of Technical Papers (Cat. No.00CH37140).

[5]  Jun Yang,et al.  Frequent value locality and value-centric data cache design , 2000, ASPLOS IX.

[6]  Michael Zhang,et al.  Highly-Associative Caches for Low-Power Processors , 2000 .

[7]  W. Dally,et al.  Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[8]  William J. Dally,et al.  A delay model and speculative architecture for pipelined routers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[9]  Orion: a power-performance simulator for interconnection networks , 2002, MICRO 35.

[10]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[11]  Jörg Henkel,et al.  A dictionary-based en/decoding scheme for low-power data buses , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[12]  Jaehyuk Huh,et al.  Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture , 2003, ISCA '03.

[13]  Anant Agarwal,et al.  Scalar operand networks: on-chip interconnect for ILP in partitioned architectures , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[14]  T. N. Vijaykumar,et al.  Exploring high bandwidth pipelined cache architecture for scaled technology , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[15]  John Kubiatowicz,et al.  Exploiting prediction to reduce power on buses , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[16]  K. Banerjee,et al.  A global interconnect optimization scheme for nanometer scale VLSI with implications for latency, bandwidth, and power dissipation , 2004, IEEE Transactions on Electron Devices.

[17]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[18]  David A. Wood,et al.  Adaptive cache compression for high-performance processors , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[19]  David A. Wood,et al.  Managing Wire Delay in Large Chip-Multiprocessor Caches , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[20]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[21]  Steven K. Reinhardt,et al.  A unified compressed memory hierarchy , 2005, 11th International Symposium on High-Performance Computer Architecture.

[22]  Dean M. Tullsen,et al.  Interconnections in multi-core architectures: understanding mechanisms, overheads and scaling , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[23]  Karthik Ramani,et al.  Interconnect-Aware Coherence Protocols for Chip Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[24]  Alyssa B. Apsel,et al.  Leveraging Optical Technology in Future Bus-based Chip Multiprocessors , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[25]  Henry Hoffmann,et al.  On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.

[26]  Niraj K. Jha,et al.  Express virtual channels: towards the ideal interconnection fabric , 2007, ISCA '07.

[27]  Valentin Puente,et al.  Rotary router: an efficient architecture for CMP interconnection networks , 2007, ISCA '07.

[28]  William J. Dally,et al.  Flattened Butterfly Topology for On-Chip Networks , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[29]  D. Jayasimha,et al.  On-Chip Interconnection Networks : Why They are Different and How to Compare Them , 2007 .

[30]  Sriram R. Vangal,et al.  A 5-GHz Mesh Interconnect for a Teraflops Processor , 2007, IEEE Micro.

[31]  Per Stenström,et al.  Memory-Link Compression Schemes: A Value Locality Perspective , 2008, IEEE Transactions on Computers.

[32]  Chita R. Das,et al.  Performance and power optimization through data compression in Network-on-Chip architectures , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.