A new idiom recognition framework for exploiting hardware-assist instructions

Modern processors support hardware-assist instructions (such as TRT and TROT instructions on IBM zSeries) to accelerate certain functions such as delimiter search and character conversion. Such special instructions have often been used in high performance libraries, but they have not been exploited well in optimizing compilers except for some limited cases. We propose a new idiom recognition technique derived from a topological embedding algorithm [4] to detect idiom patterns in the input program more aggressively than in previous approaches. Our approach can detect a pattern even if the code segment does not exactly match the idiom. For example, we can detect a code segment that includes additional code within the idiom pattern. We implemented our new idiom recognition approach based on the Java Just-In-Time (JIT) compiler that is part of the J9 Java Virtual Machine, and we supported several important idioms for special hardware-assist instructions on the IBM zSeries and on some models of the IBM pSeries. To demonstrate the effectiveness of our technique, we performed two experiments. The first one is to see how many more patterns we can detect compared to the previous approach. The second one is to see how much performance improvement we can achieve over the previous approach. For the first experiment, we used the Java Compatibility Kit (JCK) API tests. For the second one we used IBM XML parser, SPECjvm98, and SPCjbb2000. In summary, relative to a baseline implementation using exact pattern matching, our algorithm converted 75% more loops in JCK tests. We also observed significant performance improvement of the XML parser by 64%, of SPECjvm98 by 1%, and of SPECjbb2000 by 2% on average on a z990. Finally, we observed the JIT compilation time increases by only 0.32% to 0.44%.

[1]  Hiroyuki Sato,et al.  Array form representation of idiom recognition system for numerical programs , 2000, APL '01.

[2]  Toshiaki Yasue,et al.  Evolution of a Java just-in-time compiler for IA-32 platforms , 2004, IBM J. Res. Dev..

[3]  Scott A. Mahlke,et al.  An architecture framework for transparent instruction set customization in embedded processors , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[4]  Toshiaki Yasue,et al.  Overview of the IBM Java Just-in-Time Compiler , 2000, IBM Syst. J..

[5]  Nikola Grcevski,et al.  Java Just-in-Time Compiler and Virtual Machine Improvements for Server and Middleware Applications , 2004, Virtual Machine Research and Technology Symposium.

[6]  Saman P. Amarasinghe,et al.  Exploiting superword level parallelism with multimedia instruction sets , 2000, PLDI '00.

[7]  Rudolf Eigenmann,et al.  An Overview of Symbolic Analysis Techniques Needed for the Effective Parallelization of the Perfect Benchmarks , 1994, 1994 Internatonal Conference on Parallel Processing Vol. 2.

[8]  Toshio Nakatani,et al.  Stride prefetching by dynamically inspecting objects , 2003, PLDI '03.

[9]  Patrick Cousot,et al.  Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints , 1977, POPL.

[10]  Sato Hiroyuki Array form representation of idiom recognition system for numerical programs , 2000, ACM SIGAPL APL Quote Quad.

[11]  James Jianghai Fu,et al.  Directed Graph Pattern Matching and Topological Embedding , 1997, J. Algorithms.

[12]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[13]  Toshio Nakatani,et al.  Partial redundancy elimination for access expressions by speculative code motion , 2004, Softw. Pract. Exp..

[14]  Toshiaki Yasue,et al.  A dynamic optimization framework for a Java just-in-time compiler , 2001, OOPSLA '01.

[15]  David Whalley,et al.  Effectively exploiting indirect jumps , 1999 .

[16]  Barbara G. Ryder,et al.  Lattice frameworks for multisource and bidirectional data flow problems , 1995, TOPL.

[17]  Rudolf Eigenmann,et al.  Idiom recognition in the Polaris parallelizing compiler , 1995, ICS '95.

[18]  Timothy J. Slegel,et al.  The IBM eServer z990 microprocessor , 2004, IBM J. Res. Dev..

[19]  Michael Leuschel,et al.  A framework for the integration of partial evaluation and abstract interpretation of logic programs , 2004, TOPL.

[20]  Jaewook Shin,et al.  Superword-level parallelism in the presence of control flow , 2005, International Symposium on Code Generation and Optimization.

[21]  Bernhard Steffen,et al.  Optimal code motion: theory and practice , 1994, TOPL.

[22]  Frank Yellin,et al.  The Java Virtual Machine Specification , 1996 .

[23]  Bernhard Steffen,et al.  Partial dead code elimination , 1994, PLDI '94.

[24]  Ron Y. Pinter,et al.  Program optimization and parallelization using idioms , 1991, POPL '91.