Enumerating Joint Weight of a Binary Linear Code Using Parallel Architectures: multi-core CPUs and GPUs

In this paper, we present a parallel algorithm for enumerating joint weight of a binary linear $(n,k)$ code, aiming at accelerating assessment of its decoding error probability for network coding. Our algorithm is implemented on a multi-core CPU system and an NVIDIA graphics processing unit (GPU) system using OpenMP and compute unified device architecture (CUDA), respectively. To reduce the number of pairs of codewords to be investigated, our parallel algorithm reduces dimension $k$ by focusing on the all-one vector included in many practical codes. We also employ a population count instruction to compute joint weight of codewords with a less number of instructions. Furthermore, an efficient atomic vote and reduce scheme is deployed in our GPU-based implementation. We apply our CPU- and GPU-based implementations to a subcode of a (127,22) BCH code to evaluate the impact of acceleration.

[1]  Tadao Kasami,et al.  A Method for Computing the Weight Distribution of a Block Code by Using Its Trellis Diagram (Special Section on Information Theory and Its Applications) , 1994 .

[2]  Toru Fujiwara,et al.  A Parallel Algorithm for Enumerating Joint Weight of a Binary Linear Code in Network Coding , 2014, 2014 Second International Symposium on Computing and Networking.

[3]  Greg Humphreys,et al.  How GPUs Work , 2007, Computer.

[4]  Fumihiko Ino,et al.  Sequence Homology Search Using Fine Grained Cycle Sharing of Idle GPUs , 2012, IEEE Transactions on Parallel and Distributed Systems.

[5]  Fumihiko Ino,et al.  Efficient Acceleration of Mutual Information Computation for Nonrigid Registration Using CUDA , 2014, IEEE Journal of Biomedical and Health Informatics.

[6]  Eddy Z. Zhang,et al.  Massive atomics for massive parallelism on GPUs , 2014, ISMM '14.

[7]  Tom R. Halfhill NVIDIA's Next-Generation CUDA Compute and Graphics Architecture, Code-Named Fermi, Adds Muscle for Parallel Processing , 2009 .

[8]  Yao Zhang,et al.  Parallel Computing Experiences with CUDA , 2008, IEEE Micro.

[9]  O. Antoine,et al.  Theory of Error-correcting Codes , 2022 .

[10]  Alex Ramírez,et al.  Parallelizing general histogram application for CUDA architectures , 2013, 2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).

[11]  Henry S. Warren,et al.  Hacker's Delight , 2002 .

[12]  Shu Lin,et al.  Error control coding : fundamentals and applications , 1983 .

[13]  Henk Corporaal,et al.  High performance predictable histogramming on GPUs: exploring and evaluating algorithm trade-offs , 2011, GPGPU-4.

[14]  T. Ho,et al.  On Linear Network Coding , 2010 .

[15]  Rodney A. Kennedy,et al.  Efficient Histogram Algorithms for NVIDIA CUDA Compatible Devices , 2007 .

[16]  Rohit Chandra,et al.  Parallel programming in openMP , 2000 .

[17]  Fumihiko Ino,et al.  High-performance cone beam reconstruction using CUDA compatible GPUs , 2010, Parallel Comput..

[18]  Dwijendra K. Ray-Chaudhuri,et al.  Binary mixture flow with free energy lattice Boltzmann methods , 2022, arXiv.org.

[19]  Rudolf Ahlswede,et al.  Network information flow , 2000, IEEE Trans. Inf. Theory.