High performance architectures are using an ever increasing number of processors. The Boolean cube network has many independent paths between any pair of processors. It provides both a high communications bandwidth as well as the ability to emulate many other networks without contention for communication channels. Of particular interest for the Fast Fourier Transform (FFT) is the ability to emulate butterfly networks, which defines the communication pattern of the FFT. Each node of a Boolean cube network of N nodes has a degree of log2N . For a large number of nodes the number of channels required at the chip boundary may be unfeasibly large with several nodes to a chip, and a network with slightly lower connectivity, such as Cube Connected Cycles networks, may be preferable. The communication system is the most critical resource in many high performance architectures, and its effective use imperative. We describe FFT algorithms that use both the storage bandwidth and the communication sys-tem optimally for an architecture such as the Connection Machine that has 65536 processors interconnected in a Boolean cube related network. We also describe the necessary data allocation, and the allocation and use of the twiddle factors.
[1]
J. Cooley,et al.
The Fast Fourier Transform
,
1975
.
[2]
H. T. Kung,et al.
I/O complexity: The red-blue pebble game
,
1981,
STOC '81.
[3]
Franco P. Preparata,et al.
The cube-connected-cycles: A versatile network for parallel computation
,
1979,
20th Annual Symposium on Foundations of Computer Science (sfcs 1979).
[4]
B. Fornberg.
A vector implementation of the Fast Fourier Transform
,
1981
.
[5]
Lennart Johnsson,et al.
Combining Parallel and Sequential Sorting on a Boolean n–cube
,
1984
.
[6]
C. Sidney Burrus,et al.
On computing the split-radix FFT
,
1986,
IEEE Trans. Acoust. Speech Signal Process..