Parallel implementation of 1-D fast Fourier transform without inter-processor communications

Computing 1-D fast Fourier transform (FFT) using the classical 4-step FFT on parallel computers requires intensive all-to-all communication. This all-to-all communication significantly reduces the performance of FFT. In this paper, we present the no-communication algorithm that is a parallel algorithm for 1-D FFT without inter-processors communication. The advantage of this algorithm is the absence of all-to-all communication between processors. The disadvantage of this algorithm is the extra computation compared to the classical 4-step FFT. The no-communication algorithm has been implemented and tested in 8-node symmetric multiprocessors (SMP). The results show that the no-communication algorithm performs better than the 4-step FFT for relatively small data sizes. However, 4-step FFT algorithm performs better than the no-communication for relatively large data sizes.

[1]  David H. Bailey,et al.  FFTs in external or hierarchical memory , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[2]  Vipin Kumar,et al.  The Scalability of FFT on Parallel Computers , 1993, IEEE Trans. Parallel Distributed Syst..

[3]  Leonid Oliker,et al.  Message passing vs. shared address space on a cluster of SMPs , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[4]  Earl E. Swartzlander,et al.  Parallel Implementation of Multidimensional Transforms without Interprocessor Communication , 1999, IEEE Trans. Computers.

[5]  Daisuke Takahashi A parallel 1-D FFT algorithm for the Hitachi SR8000 , 2003, Parallel Comput..

[6]  Paul N. Swarztrauber,et al.  A comparison of optimal FFTs on torus and hypercube multicomputers , 2001, Parallel Comput..

[7]  Z. Cvetanovic,et al.  Performance Analysis of the FFT Algorithm on a Shared-Memory Parallel Architecture , 1987, IBM J. Res. Dev..