C-Pack: A High-Performance Microprocessor Cache Compression Algorithm

Microprocessor designers have been torn between tight constraints on the amount of on-chip cache memory and the high latency of off-chip memory, such as dynamic random access memory. Accessing off-chip memory generally takes an order of magnitude more time than accessing on-chip cache, and two orders of magnitude more time than executing an instruction. Computer systems and microarchitecture researchers have proposed using hardware data compression units within the memory hierarchies of microprocessors in order to improve performance, energy efficiency, and functionality. However, most past work, and all work on cache compression, has made unsubstantiated assumptions about the performance, power consumption, and area overheads of the proposed compression algorithms and hardware. It is not possible to determine whether compression at levels of the memory hierarchy closest to the processor is beneficial without understanding its costs. Furthermore, as we show in this paper, raw compression ratio is not always the most important metric. In this work, we present a lossless compression algorithm that has been designed for fast on-line data compression, and cache compression in particular. The algorithm has a number of novel features tailored for this application, including combining pairs of compressed lines into one cache line and allowing parallel compression of multiple words while using a single dictionary and without degradation in compression ratio. We reduced the proposed algorithm to a register transfer level hardware design, permitting performance, power consumption, and area estimation. Experiments comparing our work to previous work are described.

[1]  David A. Wood,et al.  Frequent Pattern Compression: A Significance-Based Compression Scheme for L2 Caches , 2004 .

[2]  Lei Yang,et al.  High-performance operating system controlled memory compression , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[3]  David A. Wood,et al.  Adaptive cache compression for high-performance processors , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[4]  Nihar R. Mahapatra,et al.  The potential of compression to improve memory system performance, power consumption, and cost , 2003, Conference Proceedings of the 2003 IEEE International Performance, Computing, and Communications Conference, 2003..

[5]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[6]  Daniel Pierre Bovet,et al.  Understanding the Linux Kernel , 2000 .

[7]  M. Ekman,et al.  A robust main-memory compression scheme , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[8]  Nihar R. Mahapatra,et al.  A Limit Study on the Potential of Compression for Improving Memory System Performance, Power Consumption, and Cost , 2005, J. Instr. Level Parallelism.

[9]  Michael E. Wazlowski,et al.  IBM Memory Expansion Technology (MXT) , 2001, IBM J. Res. Dev..

[10]  Kern Koh,et al.  Performance Analysis of On-Chip Cache and Main Memory Compression Systems for High-End Parallel Computers , 2004, PDPTA.

[11]  Alistair Moffat,et al.  Implementing the PPM data compression scheme , 1990, IEEE Trans. Commun..

[12]  Michael E. Wazlowski,et al.  Pinnacle: IBM MXT in a Memory Controller Chip , 2001, IEEE Micro.

[13]  David A. Wood,et al.  Using compression to improve chip multiprocessor performance , 2006 .

[14]  Trevor Mudge,et al.  Low-Energy Data Cache Using Sign Compression and Cache Line Bisection , 2002 .

[15]  Per Stenström,et al.  A Robust Main-Memory Compression Scheme , 2005, ISCA 2005.

[16]  David A. Wood,et al.  Interactions Between Compression and Prefetching in Chip Multiprocessors , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[17]  Thomas A. DeMassa,et al.  Digital Integrated Circuits , 1985, 1985 IEEE GaAs IC Symposium Technical Digest.

[18]  José Luis Núñez,et al.  Gbit/s lossless data compression hardware , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[19]  Aneesh Aggarwal,et al.  Restrictive compression techniques to increase level 1 cache capacity , 2005, 2005 International Conference on Computer Design.

[20]  Steven K. Reinhardt,et al.  A compressed memory hierarchy using an indirect index cache , 2004, WMPI '04.

[21]  David J. Lilja,et al.  Data prefetch mechanisms , 2000, CSUR.

[22]  John T. Robinson,et al.  Parallel compression with cooperative dictionary construction , 1996, Proceedings of Data Compression Conference - DCC '96.

[23]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[24]  Jang-Soo Lee,et al.  Design and evaluation of a selective compressed memory system , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).