论文信息 - Large-scale FFTs and convolutions on Apple hardware

Large-scale FFTs and convolutions on Apple hardware

Impressive FFT performance for large signal lengths can be achieved via a matrix paradigm that exploits the modern concepts of cache, memory, and multicore/multithreading. Each of the large-scale FFT implementations we report herein is built hierarchically on very fast FFTs from the standard OS X Accelerate library. (The hierarchical ideas should apply equally well for low-level FFTs of, say, the OpenCL/GPU variety.) By building on such established, packaged, small-length FFTs, one can achieve on a single Apple machine—and even for signal lengths into the billions—sustained processing rates in the multi-gigaflop/s region.

[1] Jason Klivington,et al. Supercomputer-style FFT library for Apple G 4 , 2000 .

[2] C. Pomerance,et al. Prime Numbers: A Computational Perspective , 2002 .

[3] Steven G. Johnson,et al. A Modified Split-Radix FFT With Fewer Arithmetic Operations , 2007, IEEE Transactions on Signal Processing.

[4] T. Lundy,et al. A new matrix approach to real FFTs and convolutions of length 2k , 2007, Computing.

[5] Paul N. Swarztrauber,et al. FFT algorithms for vector computers , 1984, Parallel Comput..

[6] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[7] W. M. Gentleman,et al. Fast Fourier Transforms: for fun and profit , 1966, AFIPS '66 (Fall).

[8] David H. Bailey,et al. FFTs in external or hierarchical memory , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).