An efficient parallel algorithm for the 3-D FFT NAS parallel benchmark

We propose an efficient algorithm to implement the 3D NAS FFT benchmark. The proposed algorithm overlaps the communication with the computation. On parallel machines supporting overlap of communication with computation, the proposed algorithm can outperform the non-overlapping version of this algorithm by a factor close to two.<<ETX>>

[1]  David H. Bailey,et al.  Performance results for two of the NAS parallel benchmarks , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).