The implementation and evaluation of dynamic code decompression using DISE

Code compression coupled with dynamic decompression is an important technique for both embedded and general-purpose microprocessors. Postfetch decompression, in which decompression is performed after the compressed instructions have been fetched, allows the instruction cache to store compressed code but requires a highly efficient decompression implementation. We propose implementing postfetch decompression using a new hardware facility called dynamic instruction stream editing (DISE). DISE provides a programmable decoder---similar in structure to those in many IA-32 processors---that is used to add functionality to an application by injecting custom code snippets into its fetched instruction stream. We present a DISE-based implementation of postfetch decompression and show that it naturally supports customized program-specific decompression dictionaries, enables parameterized decompression allowing similar-but-not-identical instruction sequences to share dictionary entries, and uses no decompression-specific hardware. We present extensive experimental results showing the virtue of this approach and evaluating the factors that impact its efficacy. We also present implementation-neutral results that give insight into the characteristics of any postfetch decompression technique. Our experiments not only demonstrate significant reduction in code size (up to 35&percent;) but also significant improvements in performance (up to 20&percent;) and energy (up to 10&percent;).

[1]  Thomas G. Szymanski,et al.  Assembling code for machines with span-dependent instructions , 1978, CACM.

[2]  Andrew Wolfe,et al.  Executing compressed programs on an embedded RISC architecture , 1992, MICRO 1992.

[3]  A. Wolfe,et al.  Executing Compressed Programs On An Embedded RISC Architecture , 1992, [1992] Proceedings the 25th Annual International Symposium on Microarchitecture MICRO 25.

[4]  Norman P. Jouppi,et al.  WRL Research Report 93/5: An Enhanced Access and Cycle Time Model for On-chip Caches , 1994 .

[5]  Christopher W. Fraser,et al.  Code compression , 1997, PLDI '97.

[6]  Kevin D. Kissell MIPS16: High-density MIPS for the Embedded Market1 , 1997 .

[7]  Trevor N. Mudge,et al.  Improving code density using compression techniques , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[8]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[9]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[10]  Robert K. Montoye,et al.  A decompression core for PowerPC , 1998, IBM J. Res. Dev..

[11]  Guido Araujo,et al.  Code compression based on operand factorization , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[12]  Keith D. Cooper,et al.  Enhanced code compression for embedded RISC processors , 1999, PLDI '99.

[13]  S. Debray,et al.  Compiler Techniques for Code Compression , 1999 .

[14]  David H. Albonesi,et al.  Selective cache ways: on-demand cache resource allocation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[15]  Kurt Keutzer,et al.  A text-compression-based method for code size minimization in embedded systems , 1999, TODE.

[16]  Sang-Joon Nam,et al.  Improving dictionary-based code compression in VLIW architectures , 1999 .

[17]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[18]  Bjorn De Sutter,et al.  Compiler techniques for code compaction , 2000, TOPL.

[19]  P. Glaskowski Pentium 4 (partially) previewed , 2000 .

[20]  Jörg Henkel,et al.  Code compression for low power embedded system design , 2000, Proceedings 37th Design Automation Conference.

[21]  Trevor N. Mudge,et al.  Reducing code size with run-time decompression , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[22]  Saumya K. Debray,et al.  Profile-guided code compression , 2002, PLDI '02.

[23]  Babak Falsafi,et al.  Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[24]  Richard Phelan Improving ARM Code Density and Performance , 2003 .

[25]  Amir Roth,et al.  DISE: a programmable macro engine for customizing applications , 2003, ISCA '03.

[26]  Amir Roth,et al.  A DISE implementation of dynamic code decompression , 2003, LCTES.

[27]  Darko Kirovski,et al.  Procedure Based Program Compression , 2004, International Journal of Parallel Programming.

[28]  Fang Yu Code Compression ∗ , 2006 .