High-throughput decoding of block turbo codes on graphics processing units

Block turbo codes (BTCs) can provide very powerful forward error correction (FEC) for several applications, such as optical networks and NAND flash memory devices. These applications require soft-decision FEC codes to guarantee the bit error rate (BER) of under 10−12 which is, however, very difficult to verify with a CPU simulator. In this paper, we present high-throughput graphics processing unit (GPU) based turbo decoding software to aid the development of very low error rate BTCs. For effective utilization of the GPUs, the software processes multiple BTC frames simultaneously and minimizes the global memory access latency. Especially, the Chase-Pyndiah algorithm is efficiently parallelized to decode every row and column of a BTC word. The GPU-based simulator achieved the throughputs of about 80 and 150 Mb/s for decoding of BTCs composed of Hamming and BCH codes, respectively. The throughput results are up to 124 times higher when compared to the CPU-based ones.

[1]  Johann A. Briffa A GPU Implementation of a MAP Decoder for Synchronization Error Correcting Codes , 2013, IEEE Communications Letters.

[2]  Jaekyun Moon,et al.  Parallel LDPC decoder implementation on GPU based on unbalanced memory coalescing , 2012, 2012 IEEE International Conference on Communications (ICC).

[3]  Ramesh Pyndiah,et al.  Near-optimum decoding of product codes: block turbo codes , 1998, IEEE Trans. Commun..

[4]  Wonyong Sung,et al.  Estimation of NAND Flash Memory Threshold Voltage Distribution for Optimum Soft-Decision Error Correction , 2013, IEEE Transactions on Signal Processing.

[5]  Joseph R. Cavallaro,et al.  A massively parallel implementation of QC-LDPC decoder on GPU , 2011, 2011 IEEE 9th Symposium on Application Specific Processors (SASP).

[6]  Joseph R. Cavallaro,et al.  Implementation of a High Throughput 3GPP Turbo Decoder on GPU , 2011, J. Signal Process. Syst..

[7]  Joseph R. Cavallaro,et al.  GPU accelerated scalable parallel decoding of LDPC codes , 2011, 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).

[8]  Peter Elias,et al.  Error-free Coding , 1954, Trans. IRE Prof. Group Inf. Theory.

[9]  Leonel Sousa,et al.  Massive parallel LDPC decoding on GPU , 2008, PPoPP.

[10]  Wonyong Sung,et al.  Memory Access Optimized Implementation of Cyclic and Quasi-Cyclic LDPC Codes on a GPGPU , 2011, J. Signal Process. Syst..

[11]  Marilyn Wolf,et al.  Design space exploration of the turbo decoding algorithm on GPUs , 2010, CASES '10.

[12]  Leonel Sousa,et al.  How GPUs can outperform ASICs for fast LDPC decoding , 2009, ICS.

[13]  Ivan B. Djordjevic,et al.  A Survey on FEC Codes for 100 G and Beyond Optical Networks , 2016, IEEE Communications Surveys & Tutorials.

[14]  Laurent Schmalen,et al.  Spatially Coupled Soft-Decision Error Correction for Future Lightwave Systems , 2015, Journal of Lightwave Technology.

[15]  Wonyong Sung,et al.  Efficient Software-Based Encoding and Decoding of BCH Codes , 2009, IEEE Transactions on Computers.

[16]  Kiran Kumar Abburi,et al.  A Scalable LDPC Decoder on GPU , 2011, 2011 24th Internatioal Conference on VLSI Design.

[17]  Joseph R. Cavallaro,et al.  High throughput low latency LDPC decoding on GPU for SDR systems , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[18]  Wei Liu,et al.  Low-Power High-Throughput BCH Error Correction VLSI Design for Multi-Level Cell NAND Flash Memories , 2006, 2006 IEEE Workshop on Signal Processing Systems Design and Implementation.