Can search algorithms save large-scale automatic performance tuning?

Empirical performance optimization of computer codes using autotuners has received significant attention in recent years. Given the increased complexity of computer architectures and scientific codes, evaluating all possible code variants is prohibitively expensive for all but the simplest kernels. One way for autotuners to overcome this hurdle is through use of a search algorithm that finds high-performing code variants while examining relatively few variants. In this paper we examine the search problem in autotuning from a mathematical optimization perspective. As an illustration of the power and limitations of this optimization, we conduct an experimental study of several optimization algorithms on a number of linear algebra kernel codes. We find that the algorithms considered obtain performance gains similar to the optimal ones found by complete enumeration or by large random searches but in a tiny fraction of the computation time.

[1]  Ken Kennedy,et al.  Automatic tuning of whole applications using direct search and a performance-based transformation system , 2006, The Journal of Supercomputing.

[2]  Chun Chen,et al.  Autotuning and Specialization: Speeding up Matrix Multiply for Small Matrices with Compiler Technology , 2010, Software Automatic Tuning, From Concepts to State-of-the-Art Results.

[3]  Ken Naono,et al.  Software Automatic Tuning : From Concepts to State-of-the-Art Results , 2010 .

[4]  P. Sadayappan,et al.  Annotation-based empirical performance tuning using Orio , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[5]  Samuel Williams,et al.  Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[6]  Vahid Tabatabaee,et al.  Parallel Parameter Tuning for Applications with Performance Variability , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[7]  Michael F. P. O'Boyle,et al.  Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation , 2004, The Journal of Supercomputing.

[8]  Jack J. Dongarra,et al.  A comparison of search heuristics for empirical code optimization , 2008, 2008 IEEE International Conference on Cluster Computing.

[9]  William Gropp,et al.  Annotations for Productivity and Performance Portability , 2007 .

[10]  John L. Nazareth,et al.  Introduction to derivative-free optimization , 2010, Math. Comput..

[11]  Michael F. P. O'Boyle,et al.  Evaluating Iterative Compilation , 2002, LCPC.

[12]  N. Higham Optimization by Direct Search in Matrix Computations , 1993, SIAM J. Matrix Anal. Appl..

[13]  Michael B. Gordy GA.M: A Matlab routine for function maximization using a Genetic Algorithm , 1996 .

[14]  Stefan M. Wild,et al.  Benchmarking Derivative-Free Optimization Algorithms , 2009, SIAM J. Optim..

[15]  Elizabeth R. Jessup,et al.  Generating Empirically Optimized Composed Matrix Kernels from MATLAB Prototypes , 2009, ICCS.

[16]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[17]  Robert E. Dorsey,et al.  Genetic algorithms for estimation problems with multiple optima , 1995 .

[18]  Samuel Williams,et al.  Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms , 2009, J. Parallel Distributed Comput..

[19]  Chun Chen,et al.  A scalable auto-tuning framework for compiler optimization , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[20]  CHARLES AUDET,et al.  Finding Optimal Algorithmic Parameters Using Derivative-Free Optimization , 2006, SIAM J. Optim..

[21]  Samuel Williams,et al.  Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors , 2007, SIAM Rev..