Efficient FFT on Torus Multicomputers: A Performance Study

In this paper, the problem of computing a one-dimensional FFT on a c-dimensional torus multicomputer is focused. Different approaches are proposed which differ in the way they use the interconnection network of the torus. One of the approaches is based on the multidimensional index mapping technique for FFT computation. A second approach is based on embedding on the torus a hypercube algorithm for computing the radix-2 Cooley-Tukey FFT. The third approach reduces the communication cost of the hypercube algorithm through the communication pipelining technique. Analytical models are presented to compare the different approaches. Finally, some performance estimates are given to illustrate the comparison.

[1]  Spira Matic Emulation of Hypercube Architecture on Nearest-Neighbor Mesh-Connected Processing Elements , 1990, IEEE Trans. Computers.

[2]  Alan P. Sprague,et al.  Placement of the Processors of a Hypercube , 1991, IEEE Trans. Computers.

[3]  Mark Homewood,et al.  The IMS T800 Transputer , 1987, IEEE Micro.

[4]  Allan O. Steinhardt,et al.  Fast algorithms for digital signal processing , 1986, Proceedings of the IEEE.

[5]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[6]  Antonio González,et al.  The Xor embedding: An embedding of hypercubes onto rings and toruses , 1993, Proceedings of International Conference on Application Specific Array Processors (ASAP '93).

[7]  Paul N. Swarztrauber,et al.  Ordered Fast Fourier Transforms on a Massively Parallel Hypercube Multiprocessor , 1991, J. Parallel Distributed Comput..