Microcode Compression Using Structured-Constrained Clustering

Modern microprocessors have used microcode as a way to implement legacy (rarely used) instructions, add new ISA features and enable patches to an existing design. As more features are added to processors (e.g. protection and virtualization), area and power costs associated with the microcode memory increased significantly. A recent Intel internal design targeted at low power and small footprint has estimated the costs of the microcode ROM to approach 20% of the total die area (and associated power consumption). Moreover, with the adoption of multicore architectures, the impact of microcode memory size on the chip area has become relevant, forcing industry to revisit the microcode size problem. A solution to address this problem is to store the microcode in a compressed form and decompress it at runtime. This paper describes techniques for microcode compression that achieve significant area and power savings, while proposes a streamlined architecture that enables high throughput within the constraints of a high performance CPU. The paper presents results for microcode compression on several commercial CPU designs which demonstrates compression ratios ranging from 50 to 62%. In addition, it proposes techniques that enable the reuse of (pre-validated) hardware building blocks that can considerably reduce the cost and design time of the microcode decompression engine in real-world designs.

[1]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[2]  Guido Araujo,et al.  Code compression based on operand factorization , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[3]  Guido Araujo,et al.  Clustering-Based Microcode Compression , 2006, 2006 International Conference on Computer Design.

[4]  Richard H. Eckhouse,et al.  An environment for research in microprogramming and emulation , 1972, CACM.

[5]  A. Wolfe,et al.  Executing Compressed Programs On An Embedded RISC Architecture , 1992, [1992] Proceedings the 25th Annual International Symposium on Microarchitecture MICRO 25.

[6]  Wayne Wolf,et al.  Compression ratio and decompression overhead tradeoffs in code compression for VLIW architectures , 2001, ASICON 2001. 2001 4th International Conference on ASIC Proceedings (Cat. No.01TH8549).

[7]  Maurice V. Wilkes,et al.  The best way to design an automatic calculating machine , 1981 .

[8]  Mauricio Breternitz,et al.  Enhanced compression techniques to simplify program decompression and execution , 1997, Proceedings International Conference on Computer Design VLSI in Computers and Processors.

[9]  Onat Menzilcioglu A case study in using two-level control stores , 1987, MICRO 20.

[10]  G. Rota The Number of Partitions of a Set , 1964 .

[11]  Wei Zhao,et al.  Architectural partitioning of control memory for application specific programmable processors , 1995, ICCAD.

[12]  Yuan Xie,et al.  Code compression for VLIW processors using variable-to-fixed coding , 2002, 15th International Symposium on System Synthesis, 2002..

[13]  S.,et al.  An Efficient Heuristic Procedure for Partitioning Graphs , 2022 .

[14]  Gideon Frieder,et al.  An analysis of code density for the two level programmable control of the Nanodata QM-1 , 1977, MICRO 10.

[15]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[16]  Sang-Joon Nam,et al.  Improving dictionary-based code compression in VLIW architectures , 1999 .

[17]  Rodolfo Azevedo,et al.  Expression-tree-based algorithms for code compression on embedded RISC architectures , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[18]  Subrata Dasgupta,et al.  The Organization of Microprogram Stores , 1979, CSUR.

[19]  Trevor N. Mudge,et al.  Evaluation of a high performance code compression method , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[20]  Nagisa Ishiura,et al.  Instruction Code Compression for Application Specific VLIW Processors Based on Automatic Field Partitioning , 2007 .

[21]  Trevor N. Mudge,et al.  Improving code density using compression techniques , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[22]  Scott J. Schwartz An Algorithm for Minimizing Read Only Memories for Machine Control , 1968, SWAT.

[23]  Nick Tredennick,et al.  Microprogrammed implementation of a single chip microprocessor , 1978, MICRO 11.

[24]  Ashok K. Agrawala,et al.  Microprogramming: Perspective and Status , 1974, IEEE Transactions on Computers.