Optimized Fast Walsh-Hadamard Transform on GPUs for non-binary LDPC decoding

We analyse the role of the FWHT under the non-binary LDPC decoding problem.We quantify the trade-off between memory bank conflicts and the throughput on GPUs.The FWHT employs radix-n approaches tuned to the number of shared memory banks.The FWHT was tuned for both 16 and 32 shared memory bank GPU architectures. The Fourier Transform Sum-Product Algorithm (FT-SPA) used in non-binary Low-Density Parity-Check (LDPC) decoding makes extensive use of the Walsh-Hadamard Transform (WHT). We have developed a massively parallel Fast Walsh-Hadamard Transform (FWHT) which exploits the Graphics Processing Unit (GPU) pipeline and memory hierarchy, thereby minimizing the level of memory bank conflicts and maximizing the number of returned instructions per clock cycle for different generations of graphics processors, with considerable speedup gains in FT-SPA based non-binary LDPC decoding.

[1]  D. Mackay,et al.  Low density parity check codes over GF(q) , 1998, 1998 Information Theory Workshop (Cat. No.98EX131).

[2]  Joseph R. Cavallaro,et al.  Parallel nonbinary LDPC decoding on GPU , 2012, 2012 Conference Record of the Forty Sixth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).

[3]  Naga K. Govindaraju,et al.  High performance discrete Fourier transforms on graphics processors , 2008, HiPC 2008.

[4]  D. Declercq,et al.  Fast Decoding Algorithm for LDPC over GF(2q) , 2003 .

[5]  Rolando Carrasco,et al.  Non-Binary Error Control Coding for Wireless Communication and Data Storage , 2008 .

[6]  M. Fossorier,et al.  Architecture of a low-complexity non-binary LDPC decoder for high order fields , 2007, 2007 International Symposium on Communications and Information Technologies.

[7]  Nicholas B. Chang,et al.  Sequential decoding of non-binary LDPC codes on graphics processing units , 2012, 2012 Conference Record of the Forty Sixth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).

[8]  Laurent Schmalen,et al.  High speed decoding of non-binary irregular LDPC codes using GPUs , 2013, SiPS 2013 Proceedings.

[9]  Kenta Kasai,et al.  FFT-SPA non-binary LDPC decoding on GPU , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Mikael Onsjö,et al.  CUDA Implementation of Iterative Updating : the Radix-2 Algorithm and Discrete Fourier Transforms , 2010 .

[11]  C. Loan Computational Frameworks for the Fast Fourier Transform , 1992 .