Multicore and Manycore Implementations of ADMM-based Decoders for LDPC Decoding

The alternate direction method of multipliers (ADMM) algorithm has recently been proposed for LDPC decoding based on linear programming (LP) techniques. Even though it improves the error rate performance compared with usual message passing (MP) techniques, it shows a higher computation complexity. However, a significant step towards LP LDPC decoding scalability and optimization is made possible since the ADMM algorithm acts as an MP decoding one. In this paper, an overview of the ADMM approach and its error correction performances is provided. Then, its computation and memory complexities are evaluated. Finally, optimized software implementations of the decoder to take advantage of multi/many-core device features are described. Optimization choices are discussed and justified according to execution profiling figures and the algorithm’s parallelism levels. Experimentation results show that this LP based decoding technique can reach WiMAX and WRAN standards real time throughput requirements on mid-range devices.

[1]  Paul H. Siegel,et al.  Efficient iterative LP decoding of LDPC codes with alternating direction method of multipliers , 2013, 2013 IEEE International Symposium on Information Theory.

[2]  Jaekyun Moon,et al.  Parallel LDPC decoder implementation on GPU based on unbalanced memory coalescing , 2012, 2012 IEEE International Conference on Communications (ICC).

[3]  Xiaojie Zhang LDPC codes : structural analysis and decoding techniques , 2012 .

[4]  Stark C. Draper,et al.  Decomposition methods for large scale LP decoding , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[5]  Barbara Chapman,et al.  Using OpenMP - portable shared memory parallel programming , 2007, Scientific and engineering computation.

[6]  Yong Dou,et al.  A multi-standard efficient column-layered LDPC decoder for Software Defined Radio on GPUs , 2013, 2013 IEEE 14th Workshop on Signal Processing Advances in Wireless Communications (SPAWC).

[7]  Bertrand Le Gal,et al.  High-Throughput Multi-Core LDPC Decoders Based on x86 Processor , 2016, IEEE Transactions on Parallel and Distributed Systems.

[8]  Vítor Manuel Mendes da Silva,et al.  Optimized Fast Walsh-Hadamard Transform on GPUs for non-binary LDPC decoding , 2014, Parallel Comput..

[9]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[10]  Yong Lin,et al.  High Throughput LDPC Decoder on GPU , 2014, IEEE Communications Letters.

[11]  Joseph R. Cavallaro,et al.  Implementation of a High Throughput 3GPP Turbo Decoder on GPU , 2011, J. Signal Process. Syst..

[12]  Bormin Huang,et al.  High-throughput GPU-based LDPC decoding , 2010, Optical Engineering + Applications.

[13]  Bertrand Le Gal,et al.  Beyond Gbps Turbo decoder on multi-core CPUs , 2016, 2016 9th International Symposium on Turbo Codes and Iterative Information Processing (ISTC).

[14]  Joseph R. Cavallaro,et al.  A massively parallel implementation of QC-LDPC decoder on GPU , 2011, 2011 IEEE 9th Symposium on Application Specific Processors (SASP).

[15]  Joseph R. Cavallaro,et al.  GPU accelerated scalable parallel decoding of LDPC codes , 2011, 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).

[16]  Ahmed Zekri,et al.  ENHANCING THE MATRIX TRANSPOSE OPERATION USING INTEL AVX INSTRUCTION SET EXTENSION , 2014 .

[17]  Jing Wang,et al.  High Performance LDPC Decoder on CELL BE for WiMAX System , 2011, 2011 Third International Conference on Communications and Mobile Computing.

[18]  Leonel Sousa,et al.  High coded data rate and multicodeword WiMAX LDPC decoding on Cell/BE , 2008 .

[19]  Xia Pan,et al.  A high throughput LDPC decoder in CMMB based on virtual radio , 2013, 2013 IEEE Wireless Communications and Networking Conference Workshops (WCNCW).

[20]  Gene M. Amdahl Computer Architecture and Amdahl's Law , 2007 .

[21]  Martin J. Wainwright,et al.  Using linear programming to Decode Binary linear codes , 2005, IEEE Transactions on Information Theory.

[22]  Wonyong Sung,et al.  Memory Access Optimized Implementation of Cyclic and Quasi-Cyclic LDPC Codes on a GPGPU , 2011, J. Signal Process. Syst..

[23]  Chao Chen,et al.  Improved ADMM Penalized Decoder for Irregular Low-Density Parity-Check Codes , 2015, IEEE Communications Letters.

[24]  Leonel Sousa,et al.  Parallel LDPC Decoding on GPUs Using a Stream-Based Computing Approach , 2009, Journal of Computer Science and Technology.

[25]  Warren J. Gross,et al.  Low-Latency Software Polar Decoders , 2015, J. Signal Process. Syst..

[26]  Leonel Sousa,et al.  GPU-based DVB-S2 LDPC decoder with high throughput and fast error floor detection , 2011 .

[27]  Xiao Han,et al.  Implementation of IEEE 802.11n LDPC codes based on general purpose processors , 2013, 2013 15th IEEE International Conference on Communication Technology.

[28]  Bertrand Le Gal,et al.  Multi-Gb/s Software Decoding of Polar Codes , 2015, IEEE Transactions on Signal Processing.

[29]  Jon Feldman,et al.  Decoding error-correcting codes via linear programming , 2003 .

[30]  Joseph R. Cavallaro,et al.  High throughput low latency LDPC decoding on GPU for SDR systems , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[31]  Bertrand Le Gal,et al.  High-Throughput LDPC Decoder on Low-Power Embedded Processors , 2015, IEEE Communications Letters.

[32]  Bertrand Le Gal,et al.  Multicore implementation of LDPC decoders based on ADMM algorithm , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[33]  Jerker Björkqvist,et al.  Efficient GPU and CPU-based LDPC decoders for long codewords , 2012 .

[34]  Stark C. Draper,et al.  Suppressing pseudocodewords by penalizing the objective of LP decoding , 2012, 2012 IEEE Information Theory Workshop.

[35]  Robert G. Gallager,et al.  Low-density parity-check codes , 1962, IRE Trans. Inf. Theory.

[36]  Bertrand Le Gal,et al.  A High Throughput Efficient Approach for Decoding LDPC Codes onto GPU Devices , 2014, IEEE Embedded Systems Letters.

[37]  Leonel Sousa,et al.  Massively LDPC Decoding on Multicore Architectures , 2011, IEEE Transactions on Parallel and Distributed Systems.

[38]  Paul H. Siegel,et al.  Adaptive Cut Generation Algorithm for Improved Linear Programming Decoding of Binary Linear Codes , 2011, IEEE Transactions on Information Theory.

[39]  Stark C. Draper,et al.  Hardware based projection onto the parity polytope and probability simplex , 2015, 2015 49th Asilomar Conference on Signals, Systems and Computers.

[40]  Radford M. Neal,et al.  Near Shannon limit performance of low density parity check codes , 1996 .

[41]  Stark C. Draper,et al.  The ADMM Penalized Decoder for LDPC Codes , 2014, IEEE Transactions on Information Theory.

[42]  Xiaopeng Jiao,et al.  Reduced-Complexity Linear Programming Decoding Based on ADMM for LDPC Codes , 2015, IEEE Communications Letters.