A High-Performance FFT Algorithm for Vector Supercomputers

Many traditional algorithms for computing the fast Fourier transform (FFT) on conventional computers are unacceptable for advanced vector and parallel computers because they involve nonunit, power-of-two memory strides. This paper presents a practical technique for computing the FFT that avoids all such strides and ap pears to be near-optimal for a variety of current vector and parallel computers. Performance results of a pro gram based on this technique are presented. Notable among these results is that a Fortran implementation of this algorithm on the CRAY-2 runs up to 77% faster than Cray's assembly-coded library routine.