Generating Optimized Fourier Interpolation Routines for Density Functional Theory Using SPIRAL

Upsampling of a multi-dimensional data-set is an operation with wide application in image processing and quantum mechanical calculations using density functional theory. For small up sampling factors as seen in the quantum chemistry code ONETEP, a time-shift based implementation that shifts samples by a fraction of the original grid spacing to fill in the intermediate values using a frequency domain Fourier property can be a good choice. Readily available highly optimized multidimensional FFT implementations are leveraged at the expense of extra passes through the entire working set. In this paper we present an optimized variant of the time-shift based up sampling. Since ONETEP handles threading, we address the memory hierarchy and SIMD vectorization, and focus on problem dimensions relevant for ONETEP. We present a formalization of this operation within the SPIRAL framework and demonstrate auto-generated and auto-tuned interpolation libraries. We compare the performance of our generated code against the previous best implementations using highly optimized FFT libraries (FFTW and MKL). We demonstrate speed-ups in isolation averaging 3x and within ONETEP of up to 15%.

[1]  Chris-Kriton Skylaris,et al.  Hybrid MPI-OpenMP Parallelism in the ONETEP Linear-Scaling Electronic Structure Code: Application to the Delamination of Cellulose Nanofibrils. , 2014, Journal of chemical theory and computation.

[2]  Robert A. van de Geijn,et al.  FLAME: Formal Linear Algebra Methods Environment , 2001, TOMS.

[3]  R. C. Whaley,et al.  Automatically Tuned Linear Algebra Software (ATLAS) , 2011, Encyclopedia of Parallel Computing.

[4]  Franz Franchetti,et al.  SIMD Vectorization of Non-Two-Power Sized FFTs , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[5]  Paul H. J. Kelly,et al.  Optimised three-dimensional Fourier interpolation: An analysis of techniques and application to a linear-scaling density functional theory code , 2015, Comput. Phys. Commun..

[6]  Franz Franchetti,et al.  Formal loop merging for signal transforms , 2005, PLDI '05.

[7]  Franz Franchetti,et al.  SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.

[8]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[9]  David E. Bernholdt,et al.  Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models , 2005, Proceedings of the IEEE.

[10]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[11]  R. W. Johnson,et al.  A methodology for designing, modifying, and implementing Fourier transform algorithms on various architectures , 1990 .

[12]  Franz Franchetti,et al.  Short vector code generation for the discrete Fourier transform , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[13]  C. Loan Computational Frameworks for the Fast Fourier Transform , 1992 .

[14]  Matteo Frigo A Fast Fourier Transform Compiler , 1999, PLDI.

[15]  Kohn,et al.  Density functional and density matrix method scaling linearly with the number of atoms. , 1996, Physical review letters.

[16]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[17]  Chris-Kriton Skylaris,et al.  Introducing ONETEP: linear-scaling density functional simulations on parallel computers. , 2005, The Journal of chemical physics.

[18]  L P Yaroslavsky,et al.  Efficient algorithm for discrete sinc interpolation. , 1997, Applied optics.

[19]  Victor Eijkhout,et al.  Self-Adapting Linear Algebra Algorithms and Software , 2005, Proceedings of the IEEE.

[20]  Stefan Goedecker,et al.  A customized 3D GPU Poisson solver for free boundary conditions , 2013, Comput. Phys. Commun..

[21]  Franz Franchetti,et al.  Discrete fourier transform on multicore , 2009, IEEE Signal Processing Magazine.