A Search Optimization in FFTW

Generating high performance fast Fourier transform(FFT) libraries for different computer architectures is an important task. Architecture vendors sometimes have to rely on dedicated experts to tune FFT implementation on each new platform. Fastest Fourier transform in the West(FFTW) replaces this tedious and repeated work with an adaptive FFT library. It automatically generates FFT code that are comparable to libraries provided by vendors. Part of its success is due to its highly efficient straight-line style code for small DFTs, called codelets. The other part of its success is the result of a large and carefully chosen search space of FFT algorithms. FFTW mainly traverses this space by empirical search, otherwise a simple heuristic is used. However, both methods have their downside. The empirical search method spends a lot of search time on large DFT problems and the simple heuristic often delivers implementation that is much worse than optimum. An ideal approach should find a reasonably good implementation within the FFT search space in a small amount of time. Model-driven optimization is often believed to be inferior to empirical search. It is very hard to capture all the performance features of an adaptive library on many modern architectures. No one has implemented an adaptive performance model to automatically assist the search of FFT algorithms on multiple architectures. This thesis presents an implicit abstract machine model and a codelet performance model that can be used in the current FFTW framework. With the performance prediction given by these models, the empirical search engine of FFTW can be replaced without serious hurt of performance. This technique also helps to break down the runtime

[1]  Alan V. Oppenheim,et al.  Discrete-Time Signal Pro-cessing , 1989 .

[2]  Franz Franchetti,et al.  Computer generation of fast fourier transforms for the cell broadband engine , 2009, ICS '09.

[3]  Basilio B. Fraguela,et al.  Automatic Tuning of Discrete Fourier Transforms Driven by Analytical Modeling , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[4]  C. Rader Discrete Fourier transforms when the number of data samples is prime , 1968 .

[5]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[6]  Steven G. Johnson,et al.  A Modified Split-Radix FFT With Fewer Arithmetic Operations , 2007, IEEE Transactions on Signal Processing.

[7]  Matteo Frigo,et al.  A fast Fourier transform compiler , 1999, SIGP.

[8]  Chun Chen,et al.  Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy , 2005, International Symposium on Code Generation and Optimization.

[9]  Rong Zeng,et al.  The Design and Implementation of , 2002 .

[10]  Alan Jay Smith,et al.  Analysis of benchmark characteristics and benchmark performance prediction , 1996, TOCS.

[11]  Irving John Good,et al.  The Interaction Algorithm and Practical Fourier Analysis , 1958 .

[12]  L. Bluestein A linear filtering approach to the computation of discrete Fourier transform , 1970 .

[13]  Gang Ren,et al.  Is Search Really Necessary to Generate High-Performance BLAS? , 2005, Proceedings of the IEEE.

[14]  Martin Vetterli,et al.  Fast Fourier transforms: a tutorial review and a state of the art , 1990 .

[15]  Satoshi Matsuoka,et al.  Auto-tuning 3-D FFT library for CUDA GPUs , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[16]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[17]  Franz Franchetti,et al.  SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.

[18]  Steven G. Johnson,et al.  The Fastest Fourier Transform in the West , 1997 .