Performance and power optimization through data compression in Network-on-Chip architectures

The trend towards integrating multiple cores on the same die has accentuated the need for larger on-chip caches. Such large caches are constructed as a multitude of smaller cache banks interconnected through a packet-based network-on-chip (NoC) communication fabric. Thus, the NoC plays a critical role in optimizing the performance and power consumption of such non-uniform cache-based multicore architectures. While almost all prior NoC studies have focused on the design of router microarchitectures for achieving this goal, in this paper, we explore the role of data compression on NoC performance and energy behavior. In this context, we examine two different configurations that explore combinations of storage and communication compression: (1) Cache compression (CC) and (2) Compression in the NIC (NC). We also address techniques to hide the decompression latency by overlapping with NoC communication latency. Our simulation results with a diverse set of scientific and commercial benchmark traces reveal that CC can provide up to 33% reduction in network latency and up to 23% power savings. Even in the case of NC - where the data is compressed only when passing through the NoC fabric of the NUCA architecture and stored uncompressed - performance and power savings of up to 32% and 21%, respectively, can be obtained. These performance benefits in the interconnect translate up to 17% reduction in CPI. These benefits are orthogonal to any router architecture and make a strong case for utilizing compression for optimizing the performance and power envelope of NoC architectures. In addition, the study demonstrates the criticality of designing faster routers in shaping the performance behavior.

[1]  Arvin Park,et al.  Dynamic Base Register Caching: A Technique for Reducing Address Bus Width , 1991, ISCA.

[2]  Antonio Gonzalez,et al.  A data cache with multiple caching strategies tuned to different types of locality , 1995, International Conference on Supercomputing.

[3]  S. Jones,et al.  Design and performance of a main memory hardware data compressor , 1996, Proceedings of EUROMICRO 96. 22nd Euromicro Conference. Beyond 2000: Hardware and Software Design Strategies.

[4]  Wen-mei W. Hwu,et al.  Run-time adaptive cache management , 1998, Proceedings of the Thirty-First Hawaii International Conference on System Sciences.

[5]  Paul Barford,et al.  Generating representative Web workloads for network and server performance evaluation , 1998, SIGMETRICS '98/PERFORMANCE '98.

[6]  Sanjeev Kumar,et al.  Exploiting spatial locality in data caches using spatial footprints , 1998, ISCA.

[7]  Jang-Soo Lee,et al.  Design and evaluation of a selective compressed memory system , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[8]  Rajesh K. Gupta,et al.  Adapting cache line size to application behavior , 1999, ICS '99.

[9]  Shekhar Y. Borkar,et al.  Design challenges of technology scaling , 1999, IEEE Micro.

[10]  Jun Yang,et al.  Frequent Value Locality and Value-Centric Data Cache Design , 2000, ASPLOS.

[11]  R. Canal,et al.  Very low power pipelines using significance compression , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[12]  Michael E. Wazlowski,et al.  Pinnacle: IBM MXT in a Memory Controller Chip , 2001, IEEE Micro.

[13]  William J. Dally,et al.  Route packets, not wires: on-chip inteconnection networks , 2001, DAC '01.

[14]  William J. Dally,et al.  A delay model and speculative architecture for pipelined routers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[15]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[16]  K. Kant Compressibility Characteristics of Address / Data Transfers in Commercial Workloads , 2002 .

[17]  Ravishankar K. Iyer,et al.  Design and performance of compressed interconnects for high performance servers , 2003, Proceedings 21st International Conference on Computer Design.

[18]  Sharad Malik,et al.  Power-driven design of router microarchitectures in on-chip networks , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[19]  Li Shang,et al.  Thermal Modeling, Characterization and Management of On-Chip Networks , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[20]  Jun Yang,et al.  Frequent value encoding for low power data buses , 2004, TODE.

[21]  David A. Wood,et al.  Adaptive cache compression for high-performance processors , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[22]  David A. Wood,et al.  Managing Wire Delay in Large Chip-Multiprocessor Caches , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[23]  Simon W. Moore,et al.  Low-latency virtual-channel routers for on-chip networks , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[24]  David A. Wood,et al.  Frequent Pattern Compression: A Significance-Based Compression Scheme for L2 Caches , 2004 .

[25]  Dean M. Tullsen,et al.  Interconnections in multi-core architectures: understanding mechanisms, overheads and scaling , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[26]  Simon W. Moore,et al.  The design and implementation of a low-latency on-chip network , 2006, Asia and South Pacific Conference on Design Automation, 2006..

[27]  Chita R. Das,et al.  Exploring Fault-Tolerant Network-on-Chip Architectures , 2006, International Conference on Dependable Systems and Networks (DSN'06).

[28]  Li-Shiuan Peh,et al.  In-network cache coherence , 2006, IEEE Comput. Archit. Lett..

[29]  Chita R. Das,et al.  A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[30]  C. Nicopoulos,et al.  Design and Management of 3D Chip Multiprocessors Using Network-in-Memory , 2006, ISCA 2006.

[31]  David A. Wood,et al.  Interactions Between Compression and Prefetching in Chip Multiprocessors , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[32]  Rajeev Balasubramonian,et al.  Interconnect design considerations for large NUCA caches , 2007, ISCA '07.