Compute intensity and the FFT

This paper describes how high compute intensity programming techniques combined with algorithms in the literature can result in efficient single- and multi-dimensional FFTs on large numbers of processors on the CRAY APP. The CRAY APP is a shared-memory parallel computer based on the Intel i860 microprocessor. It incorporates up to 84 i860s in an architecture which allows for very efficient gang scheduling and barrier synchronization. FFT performance figures for various data set sizes and processor configurations are included.

[1]  David H. Bailey A High-Performance FFT Algorithm for Vector Supercomputers , 1987, PPSC.

[2]  Amir AVERBUCH,et al.  A parallel FFT on an MIMD machine , 1990, Parallel Comput..

[3]  B.R. Carlile,et al.  Algorithms and design: the CRAY APP shared-memory system , 1993, Digest of Papers. Compcon Spring.

[4]  Paul N. Swarztrauber,et al.  Vectorizing the FFTs , 1982 .

[5]  Paul N. Swarztrauber,et al.  FFT algorithms for vector computers , 1984, Parallel Comput..

[6]  D. Miles,et al.  Beyond vector processing: parallel programming on the CRAY APP , 1993, Digest of Papers. Compcon Spring.

[7]  Peter D. Welch,et al.  The Fast Fourier Transform and Its Applications , 1969 .

[8]  David H. Bailey,et al.  FFTs in external or hierarchical memory , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).