Most implementations of a radix-2 fast Fourier transform on large scientific computers use algorithms that involve memory accesses whose strides are powers of two. (The term stride means the memory increment between successive elements stored or fetched.) Such strides are unacceptable for recently developed supercomputers, particularly the Cray-2, because of serious difficulties with memory bank conflicts.This article describes an algorithm for evaluating the fast Fourier transform that avoids this difficulty and thus could provide the basis for implementations that more fully utilize the power of the Cray-2. A Fortran program implementing this algorithm is included, and timing comparisons with the Cray assembly-coded library subroutine are shown.
[1]
Marshall C. Pease,et al.
An Adaptation of the Fast Fourier Transform for Parallel Processing
,
1968,
JACM.
[2]
Paul N. Swarztrauber,et al.
Vectorizing the FFTs
,
1982
.
[3]
Paul N. Swarztrauber,et al.
FFT algorithms for vector computers
,
1984,
Parallel Comput..
[4]
B. Fornberg.
A vector implementation of the Fast Fourier Transform
,
1981
.
[5]
Alan R. Jones,et al.
Fast Fourier Transform
,
1970,
SIGP.
[6]
J. Tukey,et al.
An algorithm for the machine calculation of complex Fourier series
,
1965
.