Optimization of Low-Density Parity Check decoder performance for OpenCL designs synthesized to FPGAs

Abstract Open Computing Language (OpenCL) is a high-level language that allows developers to produce portable software for heterogeneous parallel computing platforms. OpenCL is available for a variety of hardware platforms, with compiler support being recently expanded to include Field-Programmable Gate Arrays (FPGAs). This article investigates flexible OpenCL designs for the iterative min-sum decoding algorithm for (3,6)-regular Low-Density Parity Check (LDPC) codes over a range of codeword lengths. The target FPGA hardware is the Altera Stratix V GX A7 based Nallatech 385n board. The computationally demanding LDPC decoding algorithm offers several forms of parallelism that could be exploited by the Altera Offline Compiler (AOC version 15.1) for OpenCL. Our best decoder design produced a corrected codeword throughput of 68.22 Mbps at the compiler-selected FPGA clock frequency of 163.88 MHz for a length-2048 (3,6)-regular LDPC code. For a length-1024 (3,6)-regular LDPC code, our best design produced a throughput of 54.8 Mbps (32 decoding iterations) which significantly improves on the throughput of around 7 Mbps (30 decoding iterations) produced by an OpenCL based decoder design reported by Falcao et al. for the same size of LDPC code.

[1]  Ahsan Aziz,et al.  High-Throughput FPGA-Based QC-LDPC Decoder Architecture , 2015, 2015 IEEE 82nd Vehicular Technology Conference (VTC2015-Fall).

[2]  Bruce F. Cockburn,et al.  Implementation of decoders for symmetric low density parity check codes on parallel computation platforms using OpenCL , 2016, 2016 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE).

[3]  Gabriel Falcao,et al.  Flexible design of wide-pipeline-based WiMAX QC-LDPC decoder architectures on FPGAs using high-level synthesis , 2014 .

[4]  Keshab K. Parhi,et al.  A 54 Mbps (3,6)-regular FPGA LDPC decoder , 2002, IEEE Workshop on Signal Processing Systems.

[5]  A. J. Blanksby,et al.  A 690-mW 1-Gb/s 1024-b, rate-1/2 low-density parity-check code decoder , 2001, IEEE J. Solid State Circuits.

[6]  Swapnil Mhaske,et al.  HIGH-THROUGHPUT FPGA QC-LDPC DECODER ARCHITECTURE FOR 5G WIRELESS , 2015 .

[7]  Shu Lin,et al.  Error Control Coding , 2004 .

[8]  Bruce F. Cockburn,et al.  A scalable LDPC decoder ASIC architecture with bit-serial message exchange , 2008, Integr..

[9]  Joseph R. Cavallaro,et al.  Semi-parallel reconfigurable architectures for real-time LDPC decoding , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[10]  David Novo,et al.  From low-architectural expertise up to high-throughput non-binary LDPC decoders: Optimization guidelines using high-level synthesis , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).

[11]  Shie Mannor,et al.  Fully Parallel Stochastic LDPC Decoders , 2008, IEEE Transactions on Signal Processing.

[12]  Vincent C. Gaudet,et al.  Design of High-Throughput Fully Parallel LDPC Decoders Based on Wire Partitioning , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[13]  Aaftab Munshi,et al.  The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).

[14]  Borivoje Nikolic,et al.  Low-density parity-check code constructions for hardware implementation , 2004, 2004 IEEE International Conference on Communications (IEEE Cat. No.04CH37577).

[15]  P. Urard,et al.  A 135Mb/s DVB-S2 compliant codec based on 64800b LDPC and BCH codes , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[16]  David Novo,et al.  Shortening Design Time through Multiplatform Simulations with a Portable OpenCL Golden-model: The LDPC Decoder Case , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.

[17]  Prabin Kumar Panda,et al.  Synchronization techniques for 3rd generation partnership project-long term evolution (3GPP-LTE) , 2011 .

[18]  Martin J. Wainwright,et al.  An Efficient 10GBASE-T Ethernet LDPC Decoder Design With Low Error Floors , 2010, IEEE Journal of Solid-State Circuits.

[19]  John G. Proakis,et al.  Digital Communications , 1983 .

[20]  Ahsan Aziz,et al.  A 2.48Gb/s QC-LDPC Decoder Implementation on the NI USRP-2953R , 2015, ArXiv.

[21]  Alan D. George,et al.  Comparative analysis of OpenCL vs. HDL with image-processing kernels on Stratix-V FPGA , 2015, 2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[22]  Frederico Pratas,et al.  Open the Gates: Using High-level Synthesis towards programmable LDPC decoders on FPGAs , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[23]  Kiran Kumar Abburi,et al.  A Scalable LDPC Decoder on GPU , 2011, 2011 24th Internatioal Conference on VLSI Design.

[24]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[25]  Leonel Sousa,et al.  Massive parallel LDPC decoding on GPU , 2008, PPoPP.