Efficient parallel implementation of three‐point viterbi decoding algorithm on CPU, GPU, and FPGA

In wireless communication, Viterbi decoding algorithm (VDA) is the one of most popular channel decoding algorithms, which is widely used in WLAN, WiMAX, or 3G communications. However, the throughput of Viterbi decoder is constrained by the convolutional characteristic. Recently, the three‐point VDA (TVDA) was proposed to solve this problem. In TVDA, the whole procedure can be divided into three phases, the forward, trace‐back, and decoding phases. In this paper, we analyze the parallelism of TVDA and propose parallel TVDA on the multi‐core CPU, graphics processing unit (GPU), and field programmable gate array (FPGA). We demonstrate approaches that fully exploit its performance potential on CPU, GPU, and FPGA computing platforms. For CPU platforms, we perform two optimization methods, single instruction multiple data and multithreading to gain over 145 × speedup over the naive CPU version on a quad‐core CPU platform. For GPU platforms, we propose the combination of cached memory optimization, coalesced global memory accesses, codeword packing scheme, and asynchronous data transition, achieving the throughput of 404.65 Mbps and 12 × speedup over initial GPU versions on an NVIDIA GeForce GTX580 card and 7 × speedup over Intel quad‐core CPU i5‐2300, under the same manufacturing year and both with fully optimized schemes. In addition, for FPGA platforms, we customize a radix‐4 pipelined architecture for the TVDA in a 45‐nm FPGA chip from Xilinx (XC6VLX760). Under 209.15‐MHz clock rate, it achieves a throughput of 418.30 Mbps. Finally, we also discuss the performance evaluation and efficiency comparison of different flexible architectures for real‐time Viterbi decoding in terms of the decoding throughput, power consumption, optimization schemes, programming costs, and price costs.Copyright © 2013 John Wiley & Sons, Ltd.

[1]  Tao Wang,et al.  An Implementation of Viterbi Algorithm on GPU , 2009, 2009 First International Conference on Information Science and Engineering.

[2]  M. Santhi,et al.  Synchronous pipelined two-stage radix-4 200Mbps MB-OFDM UWB Viterbi decoder on FPGA , 2009, 2009 International SoC Design Conference (ISOCC).

[3]  Seungwon Choi,et al.  Implementation of an SDR platform using GPU and its application to a 2 × 2 MIMO WiMAX system , 2011 .

[4]  P. Glenn Gulak,et al.  Architectural tradeoffs for survivor sequence memory management in Viterbi decoders , 1993, IEEE Trans. Commun..

[5]  Chau-Yun Hsu,et al.  Low Complexity Radix-4 Butterfly Design for the Viterbi Decoder , 2006, IEEE Vehicular Technology Conference.

[6]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[7]  Pao-Ann Hsiung,et al.  A Tiling-Scheme Viterbi Decoder in Software Defined Radio for GPUs , 2011, 2011 7th International Conference on Wireless Communications, Networking and Mobile Computing.

[8]  Dennis Goeckel,et al.  A reconfigurable, power-efficient adaptive Viterbi decoder , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[9]  Sangsung Choi,et al.  A Two-Stage Radix-4 Viterbi Decoder for Multiband OFDM UWB Systems , 2008 .

[10]  Naresh R. Shanbhag,et al.  Low-power pre-decoding based viterbi decoder for tail-biting convolutional codes , 2009, 2009 IEEE Workshop on Signal Processing Systems.

[11]  Yong Dou,et al.  A high-throughput reconfigurable Viterbi decoder , 2011, 2011 International Conference on Wireless Communications and Signal Processing (WCSP).

[12]  Seungwon Choi,et al.  Implementation of an SDR system using graphics processing unit , 2010, IEEE Communications Magazine.

[13]  R. Cumplido,et al.  A Runtime Reconfigurable Architecture for Viterbi Decoding , 2006, 2006 3rd International Conference on Electrical and Electronics Engineering.

[14]  Sin-Chong Park,et al.  Implementation of the Modified State Mapping Viterbi Decoder with Radix-4 , 2006, 2006 International Conference on Communication Technology.

[15]  Franz Franchetti,et al.  Computer Generation of Efficient Software Viterbi Decoders , 2010, HiPEAC.

[16]  Nader Bagherzadeh,et al.  A Multi-Standard Viterbi Decoder for Mobile Applications Using a Reconfigurable Architecture , 2006, IEEE Vehicular Technology Conference.

[17]  Yong Dou,et al.  Optimization schemes and performance evaluation of Smith–Waterman algorithm on CPU, GPU and FPGA , 2012, Concurr. Comput. Pract. Exp..

[18]  A.Z. Sha'ameri,et al.  Configurable adaptive Viterbi decoder for GPRS, EDGE and Wimax , 2007, 2007 IEEE International Conference on Telecommunications and Malaysia International Conference on Communications.

[19]  Jr. G. Forney,et al.  Viterbi Algorithm , 1973, Encyclopedia of Machine Learning.