Exploiting statistical correlations for proactive prediction of program behaviors

This paper presents a finding and a technique on program behavior prediction. The finding is that surprisingly strong statistical correlations exist among the behaviors of different program components (e.g., loops) and among different types of program level behaviors (e.g., loop trip-counts versus data values). Furthermore, the correlations can be beneficially exploited: They help resolve the proactivity-adaptivity dilemma faced by existing program behavior predictions, making it possible to gain the strengths of both approaches--the large scope and earliness of offline-profiling--based predictions, and the cross-input adaptivity of runtime sampling-based predictions. The main technique contributed by this paper centers on a new concept, seminal behaviors. Enlightened by the existence of strong correlations among program behaviors, we propose a regression based framework to automatically identify a small set of behaviors that can lead to accurate prediction of other behaviors in a program. We call these seminal behaviors. By applying statistical learning techniques, the framework constructs predictive models that map from seminal behaviors to other behaviors, enabling proactive and cross-input adaptive prediction of program behaviors. The prediction helps a commercial compiler, the IBM XL C compiler, generate code that runs up to 45% faster (5%-13% on average), demonstrating the large potential of correlation-based techniques for program optimizations.

[1]  Xiaofeng Gao,et al.  Profile-guided proactive garbage collection for locality optimization , 2006, PLDI '06.

[2]  Matthew Arnold,et al.  Adaptive optimization in the Jalapeno JVM , 2000, SIGP.

[3]  David A. Padua,et al.  A dynamically tuned sorting library , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[4]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[5]  Ryan N. Rakvic,et al.  The Fuzzy Correlation between Code and Performance Predictability , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[6]  Michael Voss,et al.  High-level adaptive program optimization with ADAPT , 2001, PPoPP '01.

[7]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[8]  Feng Mao,et al.  Cross-Input Learning and Discriminative Prediction in Evolvable Virtual Machines , 2009, 2009 International Symposium on Code Generation and Optimization.

[9]  Mary Lou Soffa,et al.  Continuous compilation: a new approach to aggressive and adaptive code transformation , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[10]  Brad Calder,et al.  Online performance auditing: using hot optimizations without getting burned , 2006, PLDI '06.

[11]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[12]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[13]  R. Wisniewski,et al.  Performance and Environment Monitoring for Whole-System Characterization and Optimization , 2004 .

[14]  Wei-Chung Hsu,et al.  Dynamic Profile Driven Code Version Selection , 2007 .

[15]  Michael F. P. O'Boyle,et al.  Automatic Feature Generation for Machine Learning Based Optimizing Compilation , 2009, 2009 International Symposium on Code Generation and Optimization.

[16]  Cheng Wang,et al.  Parametric analysis for adaptive computation offloading , 2004, PLDI '04.

[17]  Adam Welc,et al.  Improving virtual machine performance using a cross-run profile repository , 2005, OOPSLA '05.