Bytecode compression via profiled grammar rewriting

This paper describes the design and implementation of a method for producing compact, bytecoded instruction sets and interpreters for them. It accepts a grammar for programs written using a simple bytecoded stack-based instruction set, as well as a training set of sample programs. The system transforms the grammar, creating an expanded grammar that represents the same language as the original grammar, but permits a shorter derivation of the sample programs and others like them. A program's derivation under the expanded grammar forms the compressed bytecode representation of the program. The interpreter for this bytecode is automatically generated from the original bytecode interpreter and the expanded grammar. Programs expressed using compressed bytecode can be substantially smaller than their original bytecode representation and even their machine code representation. For example, compression cuts the bytecode for lcc from 199KB to 58KB but increases the size of the interpreter by just over 11KB.

[1]  Christopher W. Fraser,et al.  A Retargetable C Compiler: Design and Implementation , 1995 .

[2]  Todd A. Proebsting Optimizing an ANSI C interpreter with superoperators , 1995, POPL '95.

[3]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[4]  Christopher W. Fraser Automatic inference of models for statistical code compression , 1999, PLDI '99.

[5]  John T. Robinson,et al.  Cache-Memory Interfaces in Compressed Memory Systems , 2001, IEEE Trans. Computers.

[6]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[7]  Brian Parker Tunstall,et al.  Synthesis of noiseless compression codes , 1967 .

[8]  Michael Franz,et al.  Slim binaries , 1997, CACM.

[9]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[10]  Xiaowei Shen,et al.  Performance of hardware compressed main memory , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[11]  Craig G. Nevill-Manning,et al.  Inferring Sequential Structure , 1996 .

[12]  Robert D. Cameron Source encoding using syntactic information source models , 1988, IEEE Trans. Inf. Theory.

[13]  J. Michael Lake Prediction by grammatical match , 2000, Proceedings DCC 2000. Data Compression Conference.

[14]  Keith D. Cooper,et al.  Enhanced code compression for embedded RISC processors , 1999, PLDI '99.

[15]  Shmuel Tomi Klein,et al.  Efficient variants of Huffman codes in high level languages , 1985, SIGIR '85.

[16]  Ioan Tabus,et al.  Text compression based on variable-to-fixed codes for Markov sources , 2000, Proceedings DCC 2000. Data Compression Conference.

[17]  Stan Y. Liao,et al.  Code generation and optimization for embedded digital signal processors , 1996 .

[18]  Changsong Xie,et al.  A new compression scheme for syntactically structured messages (programs) and its application to Java and the Internet , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[19]  Lex Augusteijn,et al.  A code compression system based on pipelined interpreters , 1999, Softw. Pract. Exp..

[20]  Christopher W. Fraser,et al.  Custom Instruction Sets for Code Compression , 1995 .

[21]  William S. Evans Compression via guided parsing , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[22]  Christopher W. Fraser,et al.  Code compression , 1997, PLDI '97.

[23]  Bjorn De Sutter,et al.  Compiler techniques for code compaction , 2000, TOPL.

[24]  Jay Earley,et al.  An efficient context-free parsing algorithm , 1970, Commun. ACM.

[25]  Yannis Smaragdakis,et al.  The Case for Compressed Caching in Virtual Memory Systems , 1999, USENIX Annual Technical Conference, General Track.

[26]  Steven Lucco,et al.  Split-stream dictionary program compression , 2000, PLDI '00.

[27]  Trevor N. Mudge,et al.  Reducing code size with run-time decompression , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).