A multi-standard efficient column-layered LDPC decoder for Software Defined Radio on GPUs

In this paper, we propose a multi-standard high-throughput column-layered (CL) low-density parity-check (LDPC) decoder for Software-Defined Radio (SDR) on a Graphics Processing Unit (GPU) platform. Multiple columns in the sub-matrix of quasi-cyclic LDPC (QC-LDPC) code are parallel performed inside a block, while multiple codewords are simultaneously decoded among many blocks on the GPU. Several optimization methods are employed to enhance the throughput, such as the compressed matrix structure, memory optimization, codeword packing scheme, two-dimension thread configuration and asynchronous data transfer. The experiment shows that our decoder has low bit error ratio and the peak throughput is 712Mbps, which is about two orders of magnitude faster than that of CPU implementation and comparable to the dedicated hardware solutions. Compared to the existing fastest GPU-based implementation, the presented decoder can achieve a performance improvement of 3.0x times.

[1]  Nj Piscataway,et al.  Wireless LAN medium access control (MAC) and physical layer (PHY) specifications , 1996 .

[2]  Hui Yu,et al.  Systematic Construction and Verification Methodology for LDPC Codes , 2011, WASA.

[3]  Marc P. C. Fossorier,et al.  Shuffled iterative decoding , 2005, IEEE Transactions on Communications.

[4]  Jerker Björkqvist,et al.  Efficient GPU and CPU-based LDPC decoders for long codewords , 2012 .

[5]  Zhongfeng Wang,et al.  Efficient decoder design for high-throughput LDPC decoding , 2008, APCCAS 2008 - 2008 IEEE Asia Pacific Conference on Circuits and Systems.

[6]  Joseph R. Cavallaro,et al.  A massively parallel implementation of QC-LDPC decoder on GPU , 2011, 2011 IEEE 9th Symposium on Application Specific Processors (SASP).

[7]  Kiran Kumar Abburi,et al.  A Scalable LDPC Decoder on GPU , 2011, 2011 24th Internatioal Conference on VLSI Design.

[8]  A. Burg,et al.  A 15.8 pJ/bit/iter quasi-cyclic LDPC decoder for IEEE 802.11n in 90 nm CMOS , 2010, 2010 IEEE Asian Solid-State Circuits Conference.

[9]  Naresh R. Shanbhag,et al.  High-throughput LDPC decoders , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[10]  A. M. Abdullah,et al.  Wireless lan medium access control (mac) and physical layer (phy) specifications , 1997 .

[11]  Gerrit Beldman,et al.  Lan medium access control (mac) and physical layer (phy) specifications , 1997 .

[12]  Guido Masera,et al.  A Novel Architecture for Scalable, High Throughput, Multi-standard LDPC Decoder , 2011, 2011 14th Euromicro Conference on Digital System Design.

[13]  Leonel Sousa,et al.  How GPUs can outperform ASICs for fast LDPC decoding , 2009, ICS.

[14]  Leonel Sousa,et al.  Portable LDPC Decoding on Multicores Using OpenCL [Applications Corner] , 2012, IEEE Signal Processing Magazine.