The performance advantage of applying compression to the memory system

The memory system stores information comprising primarily instructions and data and secondarily address information, such as cache tag fields. It interacts with the processor by supporting related traffic (again comprising addresses, instructions, and data). Continuing exponential growth in processor performance, combined with technology, architecture, and application trends, place enormous demands on the memory system to permit this information storage and exchange at a high-enough performance (i.e., to provide low latency and high bandwidth access to large amounts of information). This paper comprehensively analyzes the redundancy in the information (addresses, instructions, and data) stored and exchanged between the processor and the memory system and evaluates the potential of compression in improving performance of the memory system. Analysis of traces obtained with Sun Microsystems' Shade simulator simulating SPARC executables of nine integer and six floating-point programs in the SPEC CPU2000 benchmark suite yield impressive results. Well-designed compression schemes may provide benefits in performance that far outweigh the extra time and logic for compression and decompression. This will be more so in the future since the speed and size of logic (which will be used to perform compression/decompression) are improving and are projected to improve at a much higher rate compared to those of interconnect (which will be used to communicate the information), both on-chip and off-chip.

[1]  Kevin D. Kissell MIPS16: High-density MIPS for the Embedded Market1 , 1997 .

[2]  Michael E. Wazlowski,et al.  IBM Memory Expansion Technology (MXT) , 2001, IBM J. Res. Dev..

[3]  Anoop Gupta,et al.  Parallel computer architecture - a hardware / software approach , 1998 .

[4]  Russell W. Quong,et al.  The feasibility of using compression to increase memory system performance , 1994, Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[5]  David A. Patterson,et al.  Computer Organization & Design: The Hardware/Software Interface , 1993 .

[6]  Krste Asanovic,et al.  Dynamic zero compression for cache energy reduction , 2000, MICRO 33.

[7]  M. Kozuch,et al.  Compression of embedded system programs , 1994, Proceedings 1994 IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[8]  Sven-Olof Nyström,et al.  Optimizing Code Size through Procedural Abstraction , 2000, LCTES.

[9]  David Keppel,et al.  Shade: a fast instruction-set simulator for execution profiling , 1994, SIGMETRICS.

[10]  Robert K. Montoye,et al.  A decompression core for PowerPC , 1998, IBM J. Res. Dev..

[11]  Larry Rudolph,et al.  Creating a wider bus using caching techniques , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[12]  Jang-Soo Lee,et al.  Design and evaluation of a selective compressed memory system , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[13]  Wayne H. Wolf,et al.  Random access decompression using binary arithmetic coding , 1999, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[14]  Trevor Mudge,et al.  Code Compression for DSP , 1998 .

[15]  Morten Kjelsø,et al.  Empirical study of memory-data: characteristics and compressibility , 1998 .

[16]  Arvin Park,et al.  Address compression through base register caching , 1990, [1990] Proceedings of the 23rd Annual Workshop and Symposium@m_MICRO 23: Microprogramming and Microarchitecture.

[17]  Wayne H. Wolf,et al.  SAMC: a code compression algorithm for embedded processors , 1999, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[18]  Margaret Martonosi,et al.  Branch Prediction, Instruction-Window Size, and Cache Size: Performance Trade-Offs and Simulation Techniques , 1999, IEEE Trans. Computers.

[19]  Andrew Wolfe,et al.  Executing compressed programs on an embedded RISC architecture , 1992, MICRO 1992.

[20]  Stamatis Vassiliadis,et al.  Parallel Computer Architecture , 2000, Euro-Par.

[21]  Jun Yang,et al.  Frequent value compression in data caches , 2000, MICRO 33.

[22]  Arvin Park,et al.  An analysis of the information content of address reference streams , 1991, MICRO 24.