Design and performance of compressed interconnects for high performance servers

As microprocessors scale rapidly in frequency, the design of fast and efficient interconnects becomes extremely important for low latency data access and high performance. We evaluate a technique for reducing the interconnect width by exploiting the spatial and temporal locality in communication transfers (addresses & data). The width reduction implies a number of other advantages including higher operating frequency, reduced pin-count, lower chip & board cost, etc. We evaluate the effectiveness of the proposed scheme by performing trace-driven simulations for two well-known commercial server workloads (SPECWeb99 and TPC-C). We also study the sensitivity of the compression hit ratio with respect to the number of bits compressed, size of the encoding/decoding table used and the replacement policy. The results indicate that the proposed technique has a potential to reduce address bus width in most cases and data bus widths in some cases while maintaining equal or better performance than in the uncompressed case.

[1]  Arvin Park,et al.  Dynamic Base Register Caching: A Technique for Reducing Address Bus Width , 1991, ISCA.

[2]  Balakrishna R. Iyer,et al.  Data Compression Support in Databases , 1994, VLDB.

[3]  Larry Rudolph,et al.  Creating a wider bus using caching techniques , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[4]  S. Jones,et al.  Design and performance of a main memory hardware data compressor , 1996, Proceedings of EUROMICRO 96. 22nd Euromicro Conference. Beyond 2000: Hardware and Software Design Strategies.

[5]  David J. Craft,et al.  A fast hardware data compression algorithm and some algorithmic extensions , 1998, IBM J. Res. Dev..

[6]  Morten Kjelsø,et al.  Empirical study of memory-data: characteristics and compressibility , 1998 .

[7]  Yannis Smaragdakis,et al.  The Case for Compressed Caching in Virtual Memory Systems , 1999, USENIX Annual Technical Conference, General Track.

[8]  Jang-Soo Lee,et al.  An on-chip cache compression technique to reduce decompression overhead and design complexity , 2000, J. Syst. Archit..

[9]  Jun Yang,et al.  Frequent value compression in data caches , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[10]  Xiaowei Shen,et al.  Performance of hardware compressed main memory , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[11]  Michael E. Wazlowski,et al.  Pinnacle: IBM MXT in a Memory Controller Chip , 2001, IEEE Micro.

[12]  Ravishankar K. Iyer,et al.  Improving cache performance of network intensive workloads , 2001, International Conference on Parallel Processing, 2001..

[13]  Milos Prvulovic,et al.  Improving system performance with compressed memory , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[14]  Darko Kirovski,et al.  Procedure Based Program Compression , 2004, International Journal of Parallel Programming.