The Impact of Address Arithmetic on the GPU Implementation of Fast Algorithms for the Vilenkin-Chrestenson Transform

This paper considers the impact of address arithmetic in the Cooley-Tukey and the constant geometry fast algorithms for the Vilenkin-Chrestenson transform on their implementation for the graphics processing unit (GPU). We consider issues such as using different transform radices and analyze the number of GPU instructions and register usage in the OpenCL implementations of the considered algorithms. Further, we compare the program running times on the GPU and on the central processing unit (CPU). Experiments show that the GPU implementations are from 10 to 22 times faster than the C/C++ CPU implementations, depending on the transform radix and the number of variables in the processed function. The OpenCL implementation of the constant geometry algorithm translates into a lower number of GPU arithmetic and fetch instructions and uses less registers. This implementation requires up to 21% shorter processing times than the corresponding Cooley-Tukey algorithm implementation.

[1]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[2]  H. E. Chrestenson A class of generalized Walsh functions , 1955 .

[3]  Tor M. Aamodt Architecting graphics processors for non-graphics compute acceleration , 2009, 2009 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing.

[4]  Mitchell A. Thornton Spectral transforms of mixed-radix MVL functions , 2003, 33rd International Symposium on Multiple-Valued Logic, 2003. Proceedings..

[5]  Jaakko Astola,et al.  Spectral Logic and Its Applications for the Design of Digital Devices , 2008 .

[6]  C. Loan Computational Frameworks for the Fast Fourier Transform , 1992 .

[7]  Martin Lilleeng Sætra,et al.  Graphics processing unit (GPU) programming strategies and trends in GPU computing , 2013, J. Parallel Distributed Comput..

[8]  Naga K. Govindaraju,et al.  High performance discrete Fourier transforms on graphics processors , 2008, HiPC 2008.

[9]  Rolf Drechsler,et al.  Spectral Techniques in VLSI CAD , 2001, Springer US.

[10]  Claudio Moraga,et al.  Fourier Analysis on Finite Groups with Applications in Signal Processing and System Design: Stanković/Fourier , 2005 .

[11]  Claudio Moraga On some applications of the chrestenson functions in logic design and data processing , 1985 .

[12]  Radomir S. Stankovic,et al.  Computing Spectral Transforms Used in Digital Logic on the GPU , 2012 .

[13]  Matthew Scarpino OpenCL in Action: How to Accelerate Graphics and Computations , 2011 .