Dynamic feedback: an effective technique for adaptive computing

This paper presents dynamic feedback, a technique that enables computations to adapt dynamically to different execution environments. A compiler that uses dynamic feedback produces several different versions of the same source code; each version uses a different optimization policy. The generated code alternately performs sampling phases and production phases. Each sampling phase measures the overhead of each version in the current environment. Each production phase uses the version with the least overhead in the previous sampling phase. The computation periodically resamples to adjust dynamically to changes in the environment.We have implemented dynamic feedback in the context of a parallelizing compiler for object-based programs. The generated code uses dynamic feedback to automatically choose the best synchronization optimization policy. Our experimental results show that the synchronization optimization policy has a significant impact on the overall performance of the computation, that the best policy varies from program to program, that the compiler is unable to statically choose the best policy, and that dynamic feedback enables the generated code to exhibit performance that is comparable to that of code that has been manually tuned to use the best policy. We have also performed a theoretical analysis which provides, under certain assumptions, a guaranteed optimality bound for dynamic feedback relative to a hypothetical (and unrealizable) optimal algorithm that uses the best policy at every point during the execution.

[1]  John Zahorjan,et al.  Improving the performance of runtime parallelization , 1993, PPOPP '93.

[2]  Scott A. Mahlke,et al.  Profile‐guided automatic inline expansion for C programs , 1992, Softw. Pract. Exp..

[3]  Eric A. Brewer,et al.  High-level optimization via automated statistical modeling , 1995, PPOPP '95.

[4]  L. Rauchwerger,et al.  The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization , 1999, IEEE Trans. Parallel Distributed Syst..

[5]  Robert H. Halstead,et al.  Lazy task creation: a technique for increasing the granularity of parallel programs , 1990, LISP and Functional Programming.

[6]  Andrew A. Chien,et al.  A Hybrid Execution Model for Fine-Grained Languages on Distributed Memory Multicomputers , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[7]  Urs Hölzle,et al.  Optimizing dynamically-dispatched calls with run-time type feedback , 1994, PLDI '94.

[8]  Micha Sharir,et al.  Experience with the SETL Optimizer , 1983, TOPL.

[9]  Monica S. Lam,et al.  Global optimizations for parallelism and locality on scalable parallel machines , 1993, PLDI '93.

[10]  Piet Hut,et al.  A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[11]  S LamMonica,et al.  Communication optimization and code generation for distributed memory machines , 1993 .

[12]  Brian N. Bershad,et al.  Fast, effective dynamic compilation , 1996, PLDI '96.

[13]  Robert J. Fowler,et al.  Adaptive cache coherency for detecting migratory shared data , 1993, ISCA '93.

[14]  Seth Copen Goldstein,et al.  Lazy Threads: Implementing a Fast Parallel Call , 1996, J. Parallel Distributed Comput..

[15]  James R. Larus,et al.  Application-specific protocols for user-level shared memory , 1994, Proceedings of Supercomputing '94.

[16]  Harry Berryman,et al.  Multiprocessors and run-time compilation , 1991, Concurr. Pract. Exp..

[17]  Andrew A. Chien,et al.  Obtaining sequential efficiency for concurrent object-oriented languages , 1995, POPL '95.

[18]  Ken Kennedy,et al.  Automatic Data Layout for High Performance Fortran , 1995, SC.

[19]  Reinaldo J. Michelena,et al.  Tomographic string inversion , 1990 .

[20]  W. G. Morris,et al.  CCG: a prototype coagulating code generator , 1991, PLDI '91.

[21]  Donald E. Knuth,et al.  An empirical study of FORTRAN programs , 1971, Softw. Pract. Exp..

[22]  Monica S. Lam,et al.  Heterogeneous parallel programming in Jade , 1992, Proceedings Supercomputing '92.

[23]  S LamMonica,et al.  Global optimizations for parallelism and locality on scalable parallel machines , 1993 .

[24]  Monica S. Lam,et al.  Communication optimization and code generation for distributed memory machines , 1993, PLDI '93.

[25]  Gregor Kiczales,et al.  Beyond the Black Box: Open Implementation , 1996, IEEE Softw..

[26]  Martin C. Rinard,et al.  Commutativity analysis: a new analysis framework for parallelizing compilers , 1996, PLDI '96.

[27]  Mary F. Fernández,et al.  Simple and effective link-time optimization of Modula-3 programs , 1995, PLDI '95.

[28]  David Grove,et al.  Profile-guided receiver class prediction , 1995, OOPSLA.

[29]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[30]  Karl Pettis,et al.  Profile guided code positioning , 1990, PLDI '90.

[31]  Dawson R. Engler,et al.  VCODE: a retargetable, extensible, very fast dynamic code generation system , 1996, PLDI '96.

[32]  Peter Lee,et al.  Optimizing ML with run-time code generation , 1996, PLDI '96.

[33]  Daeyeon Park,et al.  Improving the effectiveness of software prefetching with adaptive executions , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[34]  Manish Gupta,et al.  Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers , 1992, IEEE Trans. Parallel Distributed Syst..

[35]  Daniel E. Lenoski,et al.  The design and analysis of DASH: a scalable directory-based multiprocessor , 1992 .

[36]  Brian N. Bershad,et al.  Dynamic Page Mapping Policies for Cache Conflict Resolution on Standard Hardware , 1994, OSDI.

[37]  David W. Wall,et al.  Global register allocation at link time , 1986, SIGPLAN '86.

[38]  Martin Rinard,et al.  Synchronization transformations for parallel computing , 1999, ACM-SIGACT Symposium on Principles of Programming Languages.