Optimizing the Fast Fourier Transform Using Mixed Precision on Tensor Core Hardware

The Fast Fourier Transform is a fundamental tool in scientific and technical computation. The highly parallelizable nature of the algorithm makes it a suitable candidate for GPU acceleration. This paper focuses on exploiting the speedup due to using the half precision multiplication capability of the latest GPUs' tensor core hardware without significantly degrading the precision of the Fourier Transform result. We develop an algorithm that dynamically splits the input single precision dataset into two half precision sets at the lowest level, uses half precision multiplication, and recombines the result at a later step. This work paves the way for using tensor cores for high precision inputs.

[1]  W. M. Gentleman,et al.  Fast Fourier Transforms: for fun and profit , 1966, AFIPS '66 (Fall).

[2]  H. Buijs,et al.  Implementation of a fast Fourier transform (FFT) for image processing applications , 1974 .

[3]  David H. Bailey,et al.  FFTs in external or hierarchical memory , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[4]  Julien Langou,et al.  Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems , 2007, Int. J. High Perform. Comput. Appl..

[5]  J. Kong,et al.  Fourier transform infrared spectroscopic analysis of protein secondary structures. , 2007, Acta biochimica et biophysica Sinica.

[6]  Julien Langou,et al.  Accelerating scientific computations with mixed precision algorithms , 2008, Comput. Phys. Commun..

[7]  Shing-Tai Pan,et al.  THE IMPLEMENTATION OF SPEECH RECOGNITION SYSTEMS ON FPGA-BASED EMBEDDED SYSTEMS WITH SOC ARCHITECTURE , 2011 .

[8]  Xiaoming Li,et al.  A hybrid GPU/CPU FFT library for large FFT problems , 2013, 2013 IEEE 32nd International Performance Computing and Communications Conference (IPCCC).

[9]  Andreas W. Götz,et al.  SPFP: Speed without compromise - A mixed precision model for GPU accelerated molecular dynamics simulations , 2013, Comput. Phys. Commun..

[10]  Jeffrey S. Vetter,et al.  NVIDIA Tensor Core Programmability, Performance & Precision , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).