Timing high performance kernels through empirical compilation

There are a few application areas, which remain almost untouched by the historical and continuing advancement of compilation research. For the extremes of optimization required for high performance computing on one end, and embedded systems at the opposite end of the spectrum, many critical routines are still hand-tuned, often directly in assembly. At the same time, architecture implementations are performing an increasing number of compiler-like transformations in hardware, making it harder to predict the performance impact of a given series of optimizations applied at the ISA level. These issues, together with the rate of hardware evolution dictated by Moore's Law, make it almost impossible to keep key kernels running at peak efficiency. Automated empirical systems, where direct timings are used to guide optimization, have provided the most successful response to these challenges. This paper describes our approach to performing empirical optimization, which utilizes a low-level iterative compilation framework specialized for optimizing high performance computing kernels. We present results showing that this approach can not only provide speedups over traditional optimizing compilers, but can improve overall performance when compared to the best hand-tuned kernels selected by the empirical search of our well-known ATLAS package.

[1]  Matteo Frigo A Fast Fourier Transform Compiler , 1999, PLDI.

[2]  Steven G. Johnson,et al.  FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  Michael F. P. O'Boyle,et al.  Iterative Compilation , 2002, Embedded Processor Design Challenges.

[4]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[5]  R. C. Whaley,et al.  Automated empirical optimization of high performance floating point kernels , 2004 .

[6]  Franz Franchetti,et al.  Efficient Utilization of SIMD Extensions , 2005, Proceedings of the IEEE.

[7]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[8]  Franz Franchetti,et al.  SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.

[9]  Ken Kennedy,et al.  Telescoping Languages: A System for Automatic Generation of Domain Languages , 2005, Proceedings of the IEEE.

[10]  David I. August,et al.  Compiler optimization-space exploration , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[11]  Yoon-Ju Lee,et al.  A case study using empirical optimization for a large, engineering application , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[12]  Jack J. Dongarra,et al.  Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..

[13]  PaduaDavid,et al.  A comparison of empirical and model-driven optimization , 2003 .

[14]  Charles L. Lawson,et al.  Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.

[15]  T. Kisuki,et al.  Iterative Compilation in Program Optimization , 2000 .