FFTX and SpectralPack: A First Look

We propose FFTX, a new framework for building high-performance FFT-based applications on exascale machines. Complex node architectures lead to multiple levels of parallelism and demand efficient ways of data communication. The current FFTW interface falls short in maximizing performance in such scenarios. FFTX is designed to enable application developers to leverage expert-level, automatic optimizations while navigating a familiar interface. FFTX is backwards compatible to FFTW and extends the FFTW Interface into an embedded Domain Specific Language (DSL) expressed as a library interface. By means of a SPIRAL-based back end, this enables build-time source-to-source translation and advanced performance optimizations, such as cross-library calls optimizations, targeting of accelerators through offload-ing, and inlining of user-provided kernels. We demonstrate the use of FFTX with the prototypical example of 1D and 3D pruned convolutions and discuss future extensions.

[1]  Tze Meng Low,et al.  High Assurance Code Generation for Cyber-Physical Systems , 2017, 2017 IEEE 18th International Symposium on High Assurance Systems Engineering (HASE).

[2]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[3]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[4]  R W Hockney,et al.  Computer Simulation Using Particles , 1966 .

[5]  Samuel Williams,et al.  Compiler generation and autotuning of communication-avoiding operators for geometric multigrid , 2013, 20th Annual International Conference on High Performance Computing.

[6]  C. R. Anderson A method of local corrections for computing the velocity field due to a distribution of vortex blobs , 1986 .

[7]  Hervé Moulinec,et al.  A FFT-Based Numerical Method for Computing the Mechanical Properties of Composites from Images of their Microstructures , 1995 .

[8]  Henri Vincenti,et al.  Ultrahigh-order Maxwell solver with extreme scalability for electromagnetic PIC simulations of plasmas , 2017, Comput. Phys. Commun..

[9]  P. Colella,et al.  A local corrections algorithm for solving Poisson’s equation in three dimensions , 2006 .

[10]  Franz Franchetti,et al.  Operator Language: A Program Generation Framework for Fast Kernels , 2009, DSL.

[11]  Andrew Canning,et al.  Scaling first-principles plane-wave codes to thousands of processors , 2005, Comput. Phys. Commun..

[12]  W. Zhang,et al.  Warp-X: A new exascale computing platform for beam–plasma simulations , 2017, Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment.

[13]  D. Brandt,et al.  Multi-level adaptive solutions to boundary-value problems math comptr , 1977 .

[14]  Joan McComb,et al.  Engineering and Scientific Subroutine Library for the IBM 3090 Vector Facility , 1988, IBM Syst. J..

[15]  Daniel F. Martin,et al.  A Cell-Centered Adaptive Projection Method for the Incompressible Euler Equations , 2000 .

[16]  A. Chorin A Numerical Method for Solving Incompressible Viscous Flow Problems , 1997 .

[17]  Tze Meng Low,et al.  SPIRAL: Extreme Performance Portability , 2018, Proceedings of the IEEE.

[18]  Franz Franchetti,et al.  SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.

[19]  Andrew Canning Scalable Parallel 3d FFTs for Electronic Structure Codes , 2008, VECPAR.

[20]  Matteo Frigo A Fast Fourier Transform Compiler , 1999, PLDI.

[21]  Jean-Luc Vay,et al.  PPPS-2013: Topic 1.2: A domain decomposition method for pseudo-spectral electromagnetic simulations of plasmas , 2013, 2013 Abstracts IEEE International Conference on Plasma Science (ICOPS).

[22]  Steven G. Johnson,et al.  FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[23]  Mark F. Adams,et al.  Chombo Software Package for AMR Applications Design Document , 2014 .