Compiler-driven cached code compression schemes for embedded ILP processors

During the last 15 years, embedded systems have grown in complexity and performance to rival desktop systems. The architectures of these systems present unique challenges to processor microarchitecture, including instruction encoding and instruction fetch processes. This paper presents new techniques for reducing embedded system code size without reducing functionality. This approach is to extract the pipeline decoder logic for an embedded VLIW processor in software at system development time. The code size reduction is achieved by Huffman compressing or tailor encoding the ISA of the original program. Some interesting results were found. In particular, the degree of compression for the ROM doesn't translate into an improvement in instructions delivered per cycle. Experiments found that when the missprediction penalty of the added Huffman decoder stage was taken into account, a Tailored ISA approach produced higher performance. Methods that compress the entire operation using Huffman encodings, and decompress at ICache hit time still achieved a median performance advantage, while providing higher ROM size savings. All results were generated by an optimizing compiler and tool suite, and presented for an encoding similar to the Intel/HP IA-64 architecture.

[1]  Andrew Wolfe,et al.  Executing compressed programs on an embedded RISC architecture , 1992, MICRO 1992.

[2]  William H. Mangione-Smith,et al.  The filter cache: an energy efficient memory structure , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[3]  Clifford Liem,et al.  Retargetable Compilers for Embedded Core Processors: Methods and Experience in Industrial Applications , 1997 .

[4]  M. Kozuch,et al.  Compression of embedded system programs , 1994, Proceedings 1994 IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[5]  Kurt Keutzer,et al.  Code density optimization for embedded DSP processors using data compression techniques , 1998, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[6]  Thomas M. Conte,et al.  Treegion scheduling for wide issue processors , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[7]  Andrew Wolfe,et al.  Executing compressed programs on an embedded RISC architecture , 1992, MICRO.

[8]  Donald B. Alpert,et al.  Architecture of the Pentium microprocessor , 1993, IEEE Micro.

[9]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[10]  Kevin D. Kissell MIPS16: High-density MIPS for the Embedded Market1 , 1997 .

[11]  Clifford Liem,et al.  Retargetable Compilers for Embedded Core Processors , 1997, Springer US.

[12]  James A. Storer,et al.  Data compression via textual substitution , 1982, JACM.

[13]  Trevor N. Mudge,et al.  Improving code density using compression techniques , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[14]  Christopher W. Fraser Automatic inference of models for statistical code compression , 1999, PLDI '99.

[15]  Keith D. Cooper,et al.  Enhanced code compression for embedded RISC processors , 1999, PLDI '99.

[16]  Liam Goudge,et al.  Embedded control problems, Thumb, and the ARM7TDMI , 1995, IEEE Micro.

[17]  Christopher W. Fraser,et al.  Code compression , 1997, PLDI '97.

[18]  Thomas M. Conte,et al.  NextPC computation for a banked instruction cache for a VLIW architecture with a compressed encoding , 1996 .

[19]  Sumedh W. Sathaye,et al.  Instruction fetch mechanisms for VLIW architectures with compressed encodings , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[20]  Thomas M. Conte,et al.  Treegion Scheduling for Highly Parallel Processors , 1997, Euro-Par.

[21]  James E. Smith,et al.  A study of branch prediction strategies , 1981, ISCA '98.

[22]  Lars Wanhammar,et al.  New approaches to high speed Huffman decoding , 1996, 1996 IEEE International Symposium on Circuits and Systems. Circuits and Systems Connecting the World. ISCAS 96.

[23]  Andrew Wolfe,et al.  A high-speed asynchronous decompression circuit for embedded processors , 1997, Proceedings Seventeenth Conference on Advanced Research in VLSI.

[24]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.