Design and Implementation of a Lightweight Dynamic Optimization System

Many opportunities exist to improve micro-architectural performance due to performance events that are dicult to optimize at static compile time. Cache misses and branch mis-prediction patterns may vary for dierent micro-architectures using dierent inputs. Dynamic optimization provides an approach to address these and other performance events at runtime. This paper describes a software system of real implementation that detects performance problems of running applications and deploys optimizations to increase execution eciency. We discuss issues of detecting performance bottlenecks, generating optimized traces and redirecting execution from the original code to the dynamically optimized code. Our current system speeds up many of the CPU2000 benchmark programs having large numbers of D-Cache misses through dynamically deployed cache prefetching. For other applications that don’t benefit from our runtime optimization, the average cost is only 2% of execution time. We present this lightweight system as an example of using existing hardware and software to deploy speculative optimizations to improve a program’s runtime performance.

[1]  Matthew Arnold,et al.  Adaptive optimization in the Jalapeño JVM , 2000, OOPSLA '00.

[2]  Michael Franz,et al.  Continuous program optimization: A case study , 2003, TOPL.

[3]  Michael Gschwind,et al.  Optimization and precise exceptions in dynamic compilation , 2001, CARN.

[4]  Brad Calder,et al.  Phase tracking and prediction , 2003, ISCA '03.

[5]  Richard Johnson,et al.  The Transmeta Code Morphing#8482; Software: using speculation, recovery, and adaptive retranslation to address real-life challenges , 2003, CGO.

[6]  Yun Wang,et al.  IA-32 Execution Layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium-based systems , 2003, MICRO.

[7]  Todd C. Mowry,et al.  Compiler-based prefetching for recursive data structures , 1996, ASPLOS VII.

[8]  Anoop Gupta,et al.  Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.

[9]  Youfeng Wu,et al.  Efficient discovery of regular stride patterns in irregular programs and its use in compiler prefetching , 2002, PLDI '02.

[10]  Derek Bruening,et al.  An infrastructure for adaptive dynamic optimization , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[11]  James E. Smith,et al.  Comparing Program Phase Detection Techniques , 2003, MICRO.

[12]  Todd C. Mowry,et al.  Predicting data cache misses in non-numeric applications through correlation profiling , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[13]  Wen-mei W. Hwu,et al.  A hardware mechanism for dynamic extraction and relayout of program hot spots , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[14]  John Yates,et al.  FX!32 a profile-directed binary translator , 1998, IEEE Micro.

[15]  Michael Franz,et al.  Continuous Program Optimization: Design and Evaluation , 2001, IEEE Trans. Computers.

[16]  Gurindar S. Sohi,et al.  Effective jump-pointer prefetching for linked data structures , 1999, ISCA.

[17]  James M. Stichnoth,et al.  Practicing JUDO: Java under dynamic optimizations , 2000, PLDI '00.

[18]  Sanjay J. Patel,et al.  rePLay: A Hardware Framework for Dynamic Optimization , 2001, IEEE Trans. Computers.

[19]  James R. Larus,et al.  Efficient path profiling , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[20]  Wei-Chung Hsu,et al.  The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System , 2003, MICRO.

[21]  Karl Pettis,et al.  Profile guided code positioning , 1990, PLDI '90.

[22]  K. Ebcioglu,et al.  Daisy: Dynamic Compilation For 10o?40 Architectural Compatibility , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[23]  Yun Wang,et al.  IA-32 execution layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium/spl reg/-based systems , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[24]  Robert S. Cohn,et al.  Optimizing Alpha Executables on Windows NT with Spike , 1998, Digit. Tech. J..

[25]  Scott A. Mahlke,et al.  The superblock: An effective technique for VLIW and superscalar compilation , 1993, The Journal of Supercomputing.

[26]  Rajeev Balasubramonian,et al.  Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures , 2000, MICRO 33.

[27]  Cliff Click,et al.  The java hotspot TM server compiler , 2001 .

[28]  Erik R. Altman,et al.  Daisy: Dynamic Compilation For 10o?40 Architectural Compatibility , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[29]  Amitabh Srivastava,et al.  Vulcan Binary transformation in a distributed environment , 2001 .

[30]  Richard Johnson,et al.  The Transmeta Code Morphing/spl trade/ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[31]  James E. Smith,et al.  Managing multi-configuration hardware via dynamic working set analysis , 2002, ISCA.

[32]  Harish Patil,et al.  Profile-guided post-link stride prefetching , 2002, ICS '02.