Squashing microcode stores to size in embedded systems while delivering rapid microcode accesses

Microcoded customized IPs offer superior performance and direct programmability of micro-architectural structures compared to instruction-based processors, yet at the cost of drastically enlarged code sizes. Code compression can deliver size reductions but necessitates attention to performance issues, so that the performance benefits of microcoded IPs are not squandered in the process. To attain this goal, we propose in this paper a fast code compression technique through exploiting the fact that the microcodes contain a sizable amount of unspecified bits. Although the values and the positions of the specified bits are highly irregular, the proposed technique can still flexibly and precisely fill in these fully specified bits through utilizing a linear network. The linear property inherent in the compression strategy in turn enables the development of an extremely low-overhead decompression engine. At runtime, the decompressed code can be generated in such a way that all the specified bits can be filled as required by a fixed-bandwidth XOR network. The combination of the proposed flexible XOR-based network with a minimum two-level storage for highly specified fields, such as immediate values, offers utmost code compression, attained within a negligible amount of performance and hardware overhead.

[1]  Liam Goudge,et al.  Embedded control problems, Thumb, and the ARM7TDMI , 1995, IEEE Micro.

[2]  Scott A. Mahlke,et al.  PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators , 2002, J. VLSI Signal Process..

[3]  Keith D. Cooper,et al.  Enhanced code compression for embedded RISC processors , 1999, PLDI '99.

[4]  Donghyun Kim,et al.  A reconfigurable crossbar switch with adaptive bandwidth control for networks-on-chip , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[5]  Shyamkumar Thoziyoor,et al.  CACTI 5 . 1 , 2008 .

[6]  Andrew Wolfe,et al.  Executing compressed programs on an embedded RISC architecture , 1992, MICRO.

[7]  G. Stewart Introduction to matrix computations , 1973 .

[8]  Kurt Keutzer,et al.  Using minimal minterms to represent programmability , 2005, 2005 Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05).

[9]  Daniel Gajski,et al.  FPGA-friendly code compression for horizontal microcoded custom IPs , 2007, FPGA '07.

[10]  George Varghese,et al.  Design Methodology of a Low-Energy Reconfigurable Single-Chip DSP System , 2001, J. VLSI Signal Process..

[11]  Thomas M. Conte,et al.  Any-size instruction abbreviation technique for embedded DSPs , 2002, 15th Annual IEEE International ASIC/SOC Conference.

[12]  Bjorn De Sutter,et al.  Compiler techniques for code compaction , 2000, TOPL.

[13]  Alex Orailoglu,et al.  The construction of optimal deterministic partitionings in scan-based BIST fault diagnosis: mathematical foundations and cost-effective implementations , 2005, IEEE Transactions on Computers.

[14]  Robert K. Montoye,et al.  A decompression core for PowerPC , 1998, IBM J. Res. Dev..

[15]  Magnus Själander,et al.  FlexCore: Utilizing Exposed Datapath Control for Efficient Computing , 2007, 2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[16]  Per Stenström,et al.  A Flexible Code Compression Scheme Using Partitioned Look-Up Tables , 2009, HiPEAC.

[17]  Guido Araujo,et al.  Clustering-Based Microcode Compression , 2006, 2006 International Conference on Computer Design.

[18]  Daniel Gajski,et al.  Utilizing horizontal and vertical parallelism with a no-instruction-set compiler for custom datapaths , 2005, 2005 International Conference on Computer Design.