Split-stream dictionary program compression

This paper describes split-stream dictionary (SSD) compression, a new technique for transforming programs into a compact, interpretable form. We define a compressed program as interpretable when it can be decompressed at basic-block granularity with reasonable efficiency. The granularity requirement enables interpreters or just-in-time (JIT) translators to decompress basic blocks incrementally during program execution. Our previous approach to interpretable compression, the Byte-coded RISC (BRISC) program format [1], achieved unprecedented decompression speed in excess of 5 megabytes per second on a 450MHz Pentium II while compressing benchmark programs to an average of three-fifths the size of their optimized x86 representation. SSD compression combines the key idea behind BRISC with new observations about instruction re-use frequencies to yield four advantages over BRISC and other competing techniques. First, SSD is simple, requiring only a few pages of code for an effective implementation. Second, SSD compresses programs more effectively than any interpretable program compression scheme known to us. For example, SSD compressed a set of programs including the spec95 benchmarks and Microsoft Word97 to less than half the size, on average, of their optimized x86 representation. Third, SSD exceeds BRISC's decompression and JIT translation rates by over 50%. Finally, SSD's two-phased approach to JIT translation enables a virtual machine to provide graceful degradation of program execution time in the face of increasing RAM constraints. For example, using SSD, we ran Word97 using a JIT-translation buffer one-third the size of Word97's optimized x86 code, yet incurred only 27% execution time overhead.

[1]  Michael Franz Adaptive Compression of Syntax Trees and Iterative Dynamic Code Optimization: Two Basic Technologies for Mobile Object Systems , 1996, Mobile Object Systems.

[2]  Abraham Lempel,et al.  On the Complexity of Finite Sequences , 1976, IEEE Trans. Inf. Theory.

[3]  Michael Franz,et al.  Slim binaries , 1997, CACM.

[4]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[5]  Ken Arnold,et al.  The Java Programming Language , 1996 .

[6]  Ken Arnold,et al.  The Java programming language (2nd ed.) , 1998 .

[7]  William Pugh,et al.  Compressing Java class files , 1999, PLDI '99.

[8]  Tong Lai Yu Data Compression for PC Software Distribution , 1996, Softw. Pract. Exp..

[9]  Rafael Dueire Lins,et al.  Garbage collection: algorithms for automatic dynamic memory management , 1996 .

[10]  David A. Patterson,et al.  Computer architecture (2nd ed.): a quantitative approach , 1996 .

[11]  Christopher W. Fraser Automatic inference of models for statistical code compression , 1999, PLDI '99.

[12]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[13]  Ian H. Witten,et al.  Arithmetic coding for data compression , 1987, CACM.

[14]  Randy H. Katz,et al.  Next century challenges: mobile networking for “Smart Dust” , 1999, MobiCom.

[15]  Robert Wahbe,et al.  Efficient and language-independent mobile programs , 1996, PLDI '96.

[16]  Steve Furber,et al.  ARM System Architecture , 1996 .

[17]  Robert Wahbe,et al.  Omniware: A Universal Substrate for Web Programming , 1996, World Wide Web J..