Characterization of Repeating Dynamic Code Fragments

For this study, we analyze the dynamic instruction streams of the SPEC2000 integer benchmarks to find frequently occurring units of computation, or idioms. An idiom, in the broadest sense, is an interdependent piece of a computation dataflow. For example, a load-add-store idiom performs an increment operation through a set of three interdependent instructions. Using a heuristic technique that performs an exhaustive analysis on selected regions of an application’s instruction stream, we are able to derive a small set of idioms each consisting of between three and eight Al­ pha instructions, where the set covers a non-trivial fraction of the overall stream. On the average benchmark, a set consisting of ten idioms (50 total instructions) spans over 26% of the instruction stream. We provide a catalog of the top idiom occurring in each of the benchmarks. This catalog provides interesting insights into the type of small-scale computations that are frequent in general code. For each idiom, we identify the locations in the source code from which it originates. Many idioms occur in multiple static locations. We outline some potential applications for such idioms, including techniques for cache compression, more effective instruction dispersal in a clustered architecture, and specialized instructions for a customiz­ able instruction set. We more deeply investigate an application that can reduce some redundancy in trace caches and potentially boost fetch bandwidth by a careful and systematic encoding of frequent idioms into smaller instruction words. We demonstrate that a simple decoder suffices to reconstitute the instruction stream.

[1]  J.P. Shen,et al.  The block-based trace cache , 1999, Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367).

[2]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[3]  Guido Araujo,et al.  Code compression based on operand factorization , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[4]  Andreas Moshovos,et al.  CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[5]  Uri C. Weiser,et al.  MMX technology extension to the Intel architecture , 1996, IEEE Micro.

[6]  Sanjay J. Patel,et al.  Increasing the size of atomic instruction blocks using control flow assertions , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[7]  Scott A. Mahlke,et al.  Superblock formation using static program analysis , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.

[8]  Stamatis Vassiliadis,et al.  High-Performance 3-1 Interlock Collapsing ALU's , 1994, IEEE Trans. Computers.

[9]  S. Vassiliadis,et al.  SCISM: A scalable compound instruction set machine , 1994, IBM J. Res. Dev..

[10]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[11]  James E. Smith,et al.  The performance potential of data dependence speculation and collapsing , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[12]  Eric Rotenberg,et al.  Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[13]  Wayne H. Wolf,et al.  Code compression for embedded systems , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).