A Volumetric FFT for BlueGene/L

BlueGene/L is a massively parallel supercomputer organized as a three-dimensional torus of compute nodes. A fundamental challenge in harnessing the new computational capabilities of BlueGene/L is the design and implementation of numerical algorithms that scale effectively on thousands of nodes. A computational kernel of particular importance is the Fast Fourier Transform (FFT) of three-dimensional data. In this paper, we present the approach we are taking in BlueGene/L to produce a scalable FFT implementation. We rely on a volume decomposition of the data to take advantage of the toroidal communication topology. We present experimental results using an MPI-based implementation of our algorithm, in order to test the basic tenets behind our decomposition and to allow experimentation on existing platforms. Our preliminary results indicate that our algorithm scales well on as many as 512 nodes for three-dimensional FFTs of size 128 × 128 × 128.