A Blocking Algorithm for FFT on Cache-Based Processors

In this paper, we propose a blocking algorithm for computing large one-dimensional fast Fourier transform (FFT) on cache-based processors. Our proposed FFT algorithm is based on the six-step FFT algorithm. We show that the block six-step FFT algorithm improves performance by effectively utilizing the cache memory. Performance results of one-dimensional FFTs on the Sun Ultra 10 and PentiumIII PC are reported. We succeeded in obtaining performance of about 108MFLOPS on the Sun Ultra 10 (UltraSPARC-IIi 333MHz) and about 247MFLOPS on the 1GHz PentiumIII PC for 220-point FFT.

[1]  David H. Bailey,et al.  FFTs in external or hierarchical memory , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[2]  C. Loan Computational Frameworks for the Fast Fourier Transform , 1992 .

[3]  Steven G. Johnson,et al.  The Fastest Fourier Transform in the West , 1997 .

[4]  Kevin R. Wadleigh,et al.  High Performance FFT Algorithms for Cache-Coherent Multiprocessors , 1999, Int. J. High Perform. Comput. Appl..