Deconstructing iterative optimization

Iterative optimization is a popular compiler optimization approach that has been studied extensively over the past decade. In this article, we deconstruct iterative optimization by evaluating whether it works across datasets and by analyzing why it works. Up to now, most iterative optimization studies are based on a premise which was never truly evaluated: that it is possible to learn the best compiler optimizations across datasets. In this article, we evaluate this question for the first time with a very large number of datasets. We therefore compose KDataSets, a dataset suite with 1000 datasets for 32 programs, which we release to the public. We characterize the diversity of KDataSets, and subsequently use it to evaluate iterative optimization. For all 32 programs, we find that there exists at least one combination of compiler optimizations that achieves at least 83% or more of the best possible speedup across all datasets on two widely used compilers (Intel's ICC and GNU's GCC). This optimal combination is program-specific and yields speedups up to 3.75× (averaged across datasets of a program) over the highest optimization level of the compilers (-O3 for GCC and -fast for ICC). This finding suggests that optimizing programs across datasets might be much easier than previously anticipated. In addition, we evaluate the idea of introducing compiler choice as part of iterative optimization. We find that it can further improve the performance of iterative optimization because different programs favor different compilers. We also investigate why iterative optimization works by analyzing the optimal combinations. We find that only a handful optimizations yield most of the speedup. Finally, we show that optimizations interact in a complex and sometimes counterintuitive way through two case studies, which confirms that iterative optimization is an irreplaceable and important compiler strategy.

[1]  Lieven Eeckhout,et al.  Evaluating iterative optimization across 1000 datasets , 2010, PLDI '10.

[2]  Yunheung Paek,et al.  Finding effective optimization phase sequences , 2003, LCTES '03.

[3]  Peter M. W. Knijnenburg,et al.  On the impact of data input sets on statistical compiler tuning , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[4]  Steven G. Johnson,et al.  FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[5]  Kevin Skadron,et al.  Profile-based adaptation for cache decay , 2004, TACO.

[6]  Michael L. Scott,et al.  Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor , 2003, ISCA '03.

[7]  Nathan T. Slingerland 1 Performance Analysis of Instruction Set Architecture Extensions for Multimedia § , 2001 .

[8]  Olivier Temam,et al.  Collective optimization: A practical collaborative approach , 2010, TACO.

[9]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[10]  Rudolf Eigenmann,et al.  Rating Compiler Optimizations for Automatic Performance Tuning , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[11]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[12]  Lieven Eeckhout,et al.  Comparing Benchmarks Using Key Microarchitecture-Independent Characteristics , 2006, 2006 IEEE International Symposium on Workload Characterization.

[13]  Keith D. Cooper,et al.  Optimizing for reduced code space using genetic algorithms , 1999, LCTES '99.

[14]  Michael F. P. O'Boyle,et al.  MiDataSets: Creating the Conditions for a More Realistic Evaluation of Iterative Optimization , 2007, HiPEAC.

[15]  Chen Ding,et al.  Program locality analysis using reuse distance , 2009, TOPL.

[16]  Lieven Eeckhout,et al.  Cole: compiler optimization level exploration , 2008, CGO '08.

[17]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[18]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[19]  Rudolf Eigenmann,et al.  Fast, automatic, procedure-level performance tuning , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[20]  Michael F. P. O'Boyle,et al.  Rapidly Selecting Good Compiler Optimizations using Performance Counters , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[21]  EeckhoutLieven,et al.  Deconstructing iterative optimization , 2012 .

[22]  Grigori Fursin,et al.  Probabilistic source-level optimisation of embedded programs , 2005, LCTES '05.

[23]  Lieven Eeckhout,et al.  Quantifying the Impact of Input Data Sets on Program Behavior and its Applications , 2003, J. Instr. Level Parallelism.

[24]  Matthias Hauswirth,et al.  Producing wrong data without doing anything obviously wrong! , 2009, ASPLOS.

[25]  Keith D. Cooper,et al.  ACME: adaptive compilation made efficient , 2005, LCTES '05.

[26]  David I. August,et al.  Compiler optimization-space exploration , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[27]  Feng Mao,et al.  Influence of program inputs on the selection of garbage collectors , 2009, VEE '09.

[28]  Saman P. Amarasinghe,et al.  Meta optimization: improving compiler heuristics with machine learning , 2003, PLDI '03.

[29]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[30]  Matt T. Yourst PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[31]  Rudolf Eigenmann,et al.  Fast and effective orchestration of compiler optimizations for automatic performance tuning , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[32]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[33]  Wei-Chung Hsu,et al.  On the predictability of program behavior using different input data sets , 2002, Proceedings Sixth Annual Workshop on Interaction between Compilers and Computer Architectures.

[34]  Michael F. P. O'Boyle,et al.  Using machine learning to focus iterative optimization , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[35]  Feng Mao,et al.  Exploiting statistical correlations for proactive prediction of program behaviors , 2010, CGO '10.

[36]  Jack J. Dongarra,et al.  Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..

[37]  Min Zhou,et al.  Experiences and lessons learned with a portable interface to hardware performance counters , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[38]  José Nelson Amaral,et al.  Aestimo: a feedback-directed optimization evaluation tool , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.

[39]  Douglas L. Jones,et al.  Fast searches for effective optimization phase sequences , 2004, PLDI '04.

[40]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[41]  George Ho,et al.  PAPI: A Portable Interface to Hardware Performance Counters , 1999 .