Automatic performance model construction for the fast software exploration of new hardware designs

Developing an optimizing compiler for a newly proposed architecture is extremely difficult when there is only a simulator of the machine available. Designing such a compiler requires running many experiments in order to understand how different optimizations interact. Given that simulators are orders of magnitude slower than real processors, such experiments are highly restricted. This paper develops a technique to automatically build a performance model for predicting the impact of program transformations on any architecture, based on a limited number of automatically selected runs. As a result, the time for evaluating the impact of any compiler optimization in early design stages can be drastically reduced such that all selected potential compiler optimizations can be evaluated. This is achieved by first evaluating a small set of sample compiler optimizations on a prior set of benchmarks in order to train a model, followed by a very small number of evaluations, or probes, of the target program.We show that by training on less than 0. 7% of all possible transformations (640 samples collected from 10 benchmarks out of 880000 possible samples, 88000 per training benchmark) and probing the new program on only 4 transformations, we can predict the performance of all program transformations with an error of just 7. 3% on average. As each prediction takes almost no time to generate, this scheme provides an accurate method of evaluating compiler performance, which is several orders of magnitude faster than current approaches.

[1]  Gang Ren,et al.  Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization , 2005, LCPC.

[2]  Saman P. Amarasinghe,et al.  Meta optimization: improving compiler heuristics with machine learning , 2003, PLDI '03.

[3]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[4]  James E. Smith,et al.  A first-order superscalar processor model , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[5]  José E. Moreira,et al.  The BlueGene/L pseudo cycle-accurate simulator , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[6]  Sally A. McKee,et al.  Efficiently exploring architectural design spaces via predictive modeling , 2006, ASPLOS XII.

[7]  Paul Chow,et al.  A Comparison of Traditional and VLIW DSP Architectures for Compiled DSP Applications , 2003 .

[8]  Olivier Temam,et al.  Chaos in computer performance , 2005, Chaos.

[9]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[10]  Thomas F. Wenisch,et al.  TurboSMARTS: accurate microarchitecture simulation sampling in minutes , 2005, SIGMETRICS '05.

[11]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[12]  Gang Ren,et al.  A comparison of empirical and model-driven optimization , 2003, PLDI '03.

[13]  Michael F. P. O'Boyle,et al.  Using machine learning to focus iterative optimization , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[14]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[15]  Mark Stephenson,et al.  Predicting unroll factors using supervised classification , 2005, International Symposium on Code Generation and Optimization.

[16]  Yunheung Paek,et al.  Finding effective optimization phase sequences , 2003 .

[17]  Keshav Pingali,et al.  Think globally, search locally , 2005, ICS '05.

[18]  Rudolf Eigenmann,et al.  Fast, automatic, procedure-level performance tuning , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[19]  Monica S. Lam,et al.  Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..

[20]  Lieven Eeckhout,et al.  Control flow modeling in statistical simulation for accurate and efficient processor design studies , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[21]  Keith D. Cooper,et al.  Searching for Compilation Sequences , 2004 .

[22]  L. Almagor,et al.  Finding effective compilation sequences , 2004, LCTES '04.

[23]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[24]  Michael F. P. O'Boyle,et al.  Probabilistic source-level optimisation of embedded programs , 2005, LCTES.

[25]  John Cavazos,et al.  Inducing heuristics to decide whether to schedule , 2004, PLDI '04.

[26]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[27]  Franz Franchetti,et al.  SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.

[28]  David I. August,et al.  Compiler optimization-space exploration , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[29]  Steven G. Johnson,et al.  FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[30]  François Bodin,et al.  A Machine Learning Approach to Automatic Production of Compiler Heuristics , 2002, AIMSA.

[31]  David Parello,et al.  Towards a Systematic, Pragmatic and Architecture-Aware Program Optimization Process for Complex Processors , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[32]  Roland E. Wunderlich,et al.  SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[33]  Mary Lou Soffa,et al.  A model-based framework: an approach for profit-driven optimization , 2005, International Symposium on Code Generation and Optimization.