A cost model for a graph-based intermediate-representation in a dynamic compiler

Compilers provide many architecture-agnostic, high-level optimizations trading off peak performance for code size. High-level optimizations typically cannot precisely reason about their impact, as they are applied before the final shape of the generated machine code can be determined. However, they still need a way to estimate their transformation’s impact on the performance of a compilation unit. Therefore, compilers typically resort to modelling these estimations as trade-off functions that heuristically guide optimization decisions. Compilers such as Graal implement many such handcrafted heuristic trade-off functions, which are tuned for one particular high-level optimization. Heuristic trade-off functions base their reasoning on limited knowledge of the compilation unit, often causing transformations that heavily increase code size or even decrease performance. To address this problem, we propose a cost model for Graal’s high-level intermediate representation that models relative operation latencies and operation sizes in order to be used in trade-off functions of compiler optimizations. We implemented the cost model in Graal and used it in two code-duplication-based optimizations. This allowed us to perform a more fine-grained code size trade-off in existing compiler optimizations, reducing the code size increase of our optimizations by up to 50% compared to not using the proposed cost model in these optimizations, without sacrificing performance. Our evaluation demonstrates that the cost model allows optimizations to perform fine-grained code size and performance trade-offs outperforming hard-coded heuristics.

[1]  Thomas Würthinger,et al.  Making collection operations optimal with aggressive JIT compilation , 2017, SCALA@SPLASH.

[2]  Hanspeter Mössenböck,et al.  An intermediate representation for speculative optimizations in a dynamic compiler , 2013, VMIL '13.

[3]  Alan Jay Smith,et al.  Analysis of benchmark characteristics and benchmark performance prediction , 1996, TOCS.

[4]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[5]  Frank Yellin,et al.  The Java Virtual Machine Specification , 1996 .

[6]  Hanspeter Mössenböck,et al.  Partial Escape Analysis and Scalar Replacement for Java , 2014, CGO '14.

[7]  Richard M. Stallman,et al.  Using the GNU Compiler Collection , 2010 .

[8]  Hanspeter Mössenböck,et al.  Graal IR : An Extensible Declarative Intermediate Representation , 2013 .

[9]  Christian Wimmer,et al.  Practical partial evaluation for high-performance dynamic language runtimes , 2017, PLDI.

[10]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[11]  Jack W. Davidson,et al.  An Aggressive Approach to Loop Unrolling , 2001 .

[12]  Yoshihiko Futamura,et al.  Partial Evaluation of Computation Process--An Approach to a Compiler-Compiler , 1999, High. Order Symb. Comput..

[13]  Toshiaki Yasue,et al.  A study of devirtualization techniques for a Java Just-In-Time compiler , 2000, OOPSLA '00.

[14]  Hanspeter Mössenböck,et al.  Fast-path loop unrolling of non-counted loops to enable subsequent compiler optimizations , 2018, ManLang '18.

[15]  Martin Odersky,et al.  An Overview of the Scala Programming Language , 2004 .

[16]  Christian Wimmer,et al.  One VM to rule them all , 2013, Onward!.

[17]  Marcelo H. Cintra,et al.  A compiler cost model for speculative parallelization , 2007, TACO.

[18]  Hanspeter Mössenböck,et al.  Speculation without regret: reducing deoptimization meta-data in the Graal compiler , 2014, PPPJ '14.

[19]  Denis Barthou,et al.  On the decidability of phase ordering problem in optimizing compilation , 2006, CF '06.

[20]  Michael Franz,et al.  Linear scan register allocation on SSA form , 2010, CGO '10.

[21]  Hanspeter Mössenböck,et al.  Design of the Java HotSpot#8482; client compiler for Java 6 , 2008, TACO.

[22]  Amer Diwan,et al.  The DaCapo benchmarks: java benchmarking development and analysis , 2006, OOPSLA '06.

[23]  Craig Chambers,et al.  Debugging optimized code with dynamic deoptimization , 1992, PLDI '92.

[24]  Vivek Sarkar,et al.  A comparative study of static and profile-based heuristics for inlining , 2000 .

[25]  Mark N. Wegman,et al.  Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.

[26]  David A. Padua,et al.  Compile-Time Based Performance Prediction , 1999, LCPC.

[27]  Keith D. Cooper,et al.  Combining analyses, combining optimizations , 1995, TOPL.

[28]  Hanspeter Mössenböck,et al.  Dominance-based duplication simulation (DBDS): code duplication to enable compiler optimizations , 2018, CGO.

[29]  Cliff Click,et al.  Global code motion/global value numbering , 1995, PLDI '95.

[30]  Hanspeter Mössenböck,et al.  Trace-based Register Allocation in a JIT Compiler , 2016, PPPJ.

[31]  Prasad A. Kulkarni,et al.  AOT vs. JIT: impact of profile data on code quality , 2017, LCTES.

[32]  Mira Mezini,et al.  Da capo con scala: design and analysis of a scala benchmark suite for the java virtual machine , 2011, OOPSLA '11.

[33]  W. M. McKeeman,et al.  Peephole optimization , 1965, CACM.

[34]  Christopher A. Vick,et al.  The Java HotSpotTM Server Compiler , 2001 .

[35]  David Leopoldseder,et al.  Simulation-based code duplication for enhancing compiler optimizations , 2017, SPLASH.

[36]  Ko-Yang Wang Precise compile-time performance prediction for superscalar-based computers , 1994, PLDI '94.

[37]  Cliff Click,et al.  The Java HotSpot Server Compiler , 2001, Java Virtual Machine Research and Technology Symposium.

[38]  Hanspeter Mössenböck,et al.  An experimental study of the influence of dynamic compiler optimizations on Scala performance , 2013, SCALA@ECOOP.