The benefits and costs of DyC's run-time optimizations

DyC selectively dynamically compiles programs during their execution, utilizing the run-time-computed values of variables and data structures to apply optimizations that are based on partial evaluation. The dynamic optimizations are preplanned at static compile time in order to reduce their run-time cost; we call this staging. DyC's staged optimizations include (1) an advanced binding-time analysis that supports polyvariant specialization (enabling both single-way and multiway complete loop unrolling), polyvariant division, static loads, and static calls, (2) low-cost, dynamic versions of traditional global optimizations, such as zero and copy propagation and dead-assignment elimination, and (3) dynamic peephole optimizations, such as strength reduction. Because of this large suite of optimizations and its low dynamic compilation overhead, DyC achieves good performance improvements on programs that are larger and more complex than the kernels previously targeted by other dynamic compilation systems. This paper evaluates the benefits and costs of applying DyC's optimizations. We assess their impact on the performance of a variety of small to medium-sized programs, both for the regions of code that are actually transformed and for the entire application as a whole. Our study includes an analysis of the contribution to performance of individual optimizations, the performance effect of changing the applications' inputs, and a detailed accounting of dynamic compilation costs.

[1]  Craig Chambers,et al.  Making pure object-oriented languages practical , 1991, OOPSLA '91.

[2]  Peter Sestoft,et al.  Partial evaluation and automatic program generation , 1993, Prentice Hall international series in computer science.

[3]  Keith D. Cooper,et al.  Effective partial redundancy elimination , 1994, PLDI '94.

[4]  Susan J. Eggers,et al.  Benefits and costs of staged run-time specialization , 2001 .

[5]  Craig Chambers,et al.  Efficient multiple and predicated dispatching , 1999, OOPSLA '99.

[6]  Urs Hölzle,et al.  Optimizing dynamically-dispatched calls with run-time type feedback , 1994, PLDI '94.

[7]  Brad Calder,et al.  Value profiling , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[8]  Alan Jay Smith,et al.  Experimental evaluation of on-chip microprocessor cache memories , 1984, ISCA 1984.

[9]  Markus Mock,et al.  Annotation-Directed Run-Time Specialization in C , 1997, PEPM.

[10]  Markus Mock,et al.  Calpa: atool for automating dynamic compilation , 1999 .

[11]  Dawson R. Engler,et al.  C and tcc: a language and compiler for dynamic code generation , 1999, TOPL.

[12]  Dawson R. Engler,et al.  tcc: a system for fast, flexible, and high-level dynamic code generation , 1997, PLDI '97.

[13]  Alan Jay Smith,et al.  Experimental evaluation of on-chip microprocessor cache memories , 1984, ISCA '84.

[14]  Susan J. Eggers,et al.  Runtime code generation , 1996 .

[15]  Mark N. Wegman,et al.  An efficient method of computing static single assignment form , 1989, POPL '89.

[16]  Charles Consel,et al.  A general approach for run-time specialization and its application to C , 1996, POPL '96.

[17]  Markus Mock,et al.  Annotation-directed run-time specialization in C , 1997 .

[18]  Julia L. Lawall,et al.  Automatic, template-based run-time specialization: implementation and experimental study , 1998, Proceedings of the 1998 International Conference on Computer Languages (Cat. No.98CB36225).

[19]  Peter Lee,et al.  Optimizing ML with run-time code generation , 1996, PLDI '96.

[20]  Andrew W. AppelJanuary Measuring Limits of Fine-grained Parallelism , 1997 .

[21]  Jacques Noyé,et al.  Effective Specialization of Realistic Programs via Use Sensitivity , 1997, SAS.

[22]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[23]  Urs Hölzle,et al.  Reconciling responsiveness with performance in pure object-oriented languages , 1996, TOPL.

[24]  Markus Mock,et al.  A retrospective on: "an evaluation of staged run-time optimizations in DyC" , 2004, SIGP.

[25]  L. Peter Deutsch,et al.  Efficient implementation of the smalltalk-80 system , 1984, POPL.

[26]  Markus Mock,et al.  DyC: an expressive annotation-directed dynamic compiler for C , 2000, Theor. Comput. Sci..

[27]  Dean M. Tullsen,et al.  Simultaneous multithreading: a platform for next-generation processors , 1997, IEEE Micro.

[28]  Patrick H. Dussud TICLOS: an implementation of CLOS for the explorer family , 1989, OOPSLA '89.

[29]  Jack L. Lo,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[30]  Patrick H. Dussud TICLOS: an implementation of CLOS for the explorer family , 1989, OOPSLA 1989.

[31]  Peter Lee,et al.  Dynamic specialization in the Fabius system , 1998, CSUR.

[32]  Urs Hölzle,et al.  Integrating Independently-Developed Components in Object-Oriented Languages , 1993, ECOOP.

[33]  Brian N. Bershad,et al.  Fast, effective dynamic compilation , 1996, PLDI '96.

[34]  Brad Calder,et al.  Value Profiling and Optimization , 1999, J. Instr. Level Parallelism.

[35]  Dawson R. Engler,et al.  C: a language for high-level, efficient, and machine-independent dynamic code generation , 1995, POPL '96.