Active-learning-based surrogate models for empirical performance tuning

Performance models have profound impact on hardware-software codesign, architectural explorations, and performance tuning of scientific applications. Developing algebraic performance models is becoming an increasingly challenging task. In such situations, a statistical surrogate-based performance model, fitted to a small number of input-output points obtained from empirical evaluation on the target machine, provides a range of benefits. Accurate surrogates can emulate the output of the expensive empirical evaluation at new inputs and therefore can be used to test and/or aid search, compiler, and autotuning algorithms. We present an iterative parallel algorithm that builds surrogate performance models for scientific kernels and workloads on single-core and multicore and multinode architectures. We tailor to our unique parallel environment an active learning heuristic popular in the literature on the sequential design of computer experiments in order to identify the code variants whose evaluations have the best potential to improve the surrogate. We use the proposed approach in a number of case studies to illustrate its effectiveness.

[1]  Prasanna Balaprakash,et al.  SPAPT: Search Problems in Automatic Performance Tuning , 2012, ICCS.

[2]  Stefan M. Wild,et al.  Variable selection and sensitivity analysis using dynamic trees, with an application to computer code performance tuning , 2011, 1108.4739.

[3]  James Demmel,et al.  Statistical Models for Empirical Search-Based Performance Tuning , 2004, Int. J. High Perform. Comput. Appl..

[4]  Prasanna Balaprakash,et al.  Can search algorithms save large-scale automatic performance tuning? , 2011, ICCS.

[5]  R. Plackett,et al.  THE DESIGN OF OPTIMUM MULTIFACTORIAL EXPERIMENTS , 1946 .

[6]  David D. Cox,et al.  Machine learning for predictive auto-tuning with boosted regression trees , 2012, 2012 Innovative Parallel Computing (InPar).

[7]  P. Sadayappan,et al.  Annotation-based empirical performance tuning using Orio , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[8]  Jeffrey S. Chase,et al.  Active and accelerated learning of cost models for optimizing scientific applications , 2006, VLDB.

[9]  Ananta Tiwari,et al.  Modeling Power and Energy Usage of HPC Kernels , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[10]  Michael F. P. O'Boyle,et al.  MILEPOST GCC: machine learning based research compiler , 2008 .

[11]  Prasanna Balaprakash,et al.  Multi Objective Optimization of HPC Kernels for Performance, Power, and Energy , 2013, PMBS@SC.

[12]  Edward I. George,et al.  Bayesian Treed Models , 2002, Machine Learning.

[13]  Michael F. P. O'Boyle,et al.  Rapidly Selecting Good Compiler Optimizations using Performance Counters , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[14]  Stefan M. Wild,et al.  SPAPT : Search Problems in Automatic Performance Tuning 1 , 2011 .

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[16]  Christoforos Anagnostopoulos,et al.  Dynamic trees for streaming and massive data contexts , 2012, 1201.5568.

[17]  Saman P. Amarasinghe,et al.  Meta optimization: improving compiler heuristics with machine learning , 2003, PLDI '03.

[18]  A. O'Hagan,et al.  Bayesian emulation of complex multi-output and dynamic computer models , 2010 .

[19]  David A. Cohn,et al.  Neural Network Exploration Using Optimal Experiment Design , 1993, NIPS.

[20]  David H. Bailey,et al.  Performance Modeling: Understanding the Past and Predicting the Future , 2005, Euro-Par.

[21]  Sonja Kuhnt,et al.  Design and analysis of computer experiments , 2010 .

[22]  Samuel Williams,et al.  Performance Tuning of Scientific Applications , 2010 .

[23]  John Cavazos,et al.  Intelligent compilers , 2008, 2008 IEEE International Conference on Cluster Computing.

[24]  Archana Ganapathi,et al.  Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[25]  Jichi Guo,et al.  Automated empirical tuning of scientific codes for performance and power consumption , 2011, HiPEAC.

[26]  David Cohn,et al.  Active Learning , 2010, Encyclopedia of Machine Learning.

[27]  Michael C. Fu,et al.  Feature Article: Optimization for simulation: Theory vs. Practice , 2002, INFORMS J. Comput..

[28]  Sandia Report,et al.  Improving Performance via Mini-applications , 2009 .

[29]  R. H. Smith Optimization for Simulation : Theory vs . Practice , 2002 .

[30]  Robert B. Gramacy,et al.  Adaptive Design and Analysis of Supercomputer Experiments , 2008, Technometrics.

[31]  Robert B. Gramacy,et al.  Dynamic Trees for Learning and Design , 2009, 0912.1586.

[32]  Thomas J. Santner,et al.  The Design and Analysis of Computer Experiments , 2003, Springer Series in Statistics.