An Effective Empirical Search Method for Automatic Software Tuning

Empirical software optimization and tuning is an active research topic in the high performance computing research community. It is an adaptive system to generate optimized software using empirically searched parameters. Due to the large parameter search space, an appropriate search heuristic is an essential part of the system. This paper describes an effective search method that can be generally applied to empirical optimization. We apply this method to ATLAS (Automatically Tuned Linear Algebra Software), which is a system for empirically optimizing dense linear algebra kernels. Our experiments on four different platforms show that the new search scheme can produce parameters that can lead ATLAS to generate a library with better performance.

[1]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[2]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[3]  Franz Franchetti,et al.  SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.

[4]  David A. Padua,et al.  Advanced compiler optimizations for supercomputers , 1986, CACM.

[5]  Jack J. Dongarra,et al.  Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..

[6]  Qing Yi,et al.  Applying Loop Optimizations to Object-Oriented Abstractions Through General Classification of Array Semantics , 2004, LCPC.

[7]  James Demmel,et al.  Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.

[8]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[9]  Gang Ren,et al.  A comparison of empirical and model-driven optimization , 2003, PLDI '03.

[10]  Utpal Banerjee,et al.  A theory of loop permutations , 1990 .

[11]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[12]  V. Torczon,et al.  Direct search methods: then and now , 2000 .

[13]  Ken Kennedy,et al.  Automatic blocking of QR and LU factorizations for locality , 2004, MSP '04.

[14]  Shirley Coleman,et al.  Overcoming complexity via statistical thinking: optimising genetic algorithms for use in complex scheduling problems via designed experiments , 2002 .

[15]  Markus Schordan,et al.  Classification and Utilization of Abstractions for Optimization , 2004, ISoLA.

[16]  Allen,et al.  Optimizing Compilers for Modern Architectures , 2004 .

[17]  Steven G. Johnson,et al.  FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[18]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[19]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[20]  Victor Eijkhout,et al.  Self-Adapting Linear Algebra Algorithms and Software , 2005, Proceedings of the IEEE.

[21]  Jack Dongarra,et al.  Automatic Blocking of Nested Loops , 1990 .

[22]  Jeffrey C. Lagarias,et al.  Convergence Properties of the Nelder-Mead Simplex Method in Low Dimensions , 1998, SIAM J. Optim..

[23]  G. R. Hext,et al.  Sequential Application of Simplex Designs in Optimisation and Evolutionary Operation , 1962 .