An efficient CPU‐GPU hybrid parallel implementation for DVB‐RCS2 receiver

The second‐generation digital video broadcasting return channel via satellite (DVB‐RCS2) is a promising real‐time wireless protocol that has been widely used in many applications, such as video conferences, video feeds, and video multicasting. However, the receiver end of DVB‐RCS2 is time consuming and should be accelerated by high‐performance processing systems. Today, graphic processing units (GPUs) have been applied in communication systems due to high parallel capability and processing throughput. In this study, we design a novel pipeline of the receiver on the CPU‐GPU platform. Moreover, we propose a CPU‐GPU hybrid strategy to fully utilize resources and reduce communication latency. Compared with the parallel turbo decoder proposed in other work on the same platform, our parallel implementation achieves higher throughput. For the entire DVB‐RCS2 receiver, compared with the non‐pipelined serial and non‐pipelined parallel algorithms, our proposed pipeline obtains 20 times and 6 times speedup, respectively. In addition, the latency of our implementation is lower than that of non‐pipelined CPU‐GPU implementation, which is equal to 1.06 ms.

[1]  Norbert Wehn,et al.  Hardware/software trade-offs for advanced 3G channel coding , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[2]  Seungwon Choi,et al.  Design and implementation of GPU-based turbo decoder with a minimal latency , 2014, The 18th IEEE International Symposium on Consumer Electronics (ISCE 2014).

[3]  Masoud Salehi,et al.  Performance analysis of turbo decoder for 3GPP standard using the sliding window algorithm , 2001, 12th IEEE International Symposium on Personal, Indoor and Mobile Radio Communications. PIMRC 2001. Proceedings (Cat. No.01TH8598).

[4]  Yong Dou,et al.  An Efficient Parallel SOVA-Based Turbo Decoder for Software Defined Radio on GPU , 2014, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[5]  Marilyn Wolf,et al.  Design space exploration of the turbo decoding algorithm on GPUs , 2010, CASES '10.

[6]  Johann A. Briffa GPU Implementation and Optimization of a Flexible MAP Decoder for Synchronization Correction , 2018, ArXiv.

[7]  Yang Zhang,et al.  The acceleration of turbo decoder on the newest GPGPU of Kepler architecture , 2014, 2014 14th International Symposium on Communications and Information Technologies (ISCIT).

[8]  Nitin Chandrachoodan,et al.  GPU Implementation of a Programmable Turbo Decoder for Software Defined Radio Applications , 2012, 2012 25th International Conference on VLSI Design.

[9]  Stefano Chinnici,et al.  Fast simulation of turbo codes on GPUs , 2012, 2012 7th International Symposium on Turbo Codes and Iterative Information Processing (ISTC).

[10]  Wonyong Sung,et al.  High-throughput decoding of block turbo codes on graphics processing units , 2017, 2017 IEEE International Workshop on Signal Processing Systems (SiPS).

[11]  M. Jezequel,et al.  Exploring Parallel Processing Levels for Convolutional Turbo Decoding , 2006, 2006 2nd International Conference on Information & Communication Technologies.

[12]  Yuansheng Song,et al.  The implementation of turbo decoder on DSP in W-CDMA system , 2005, Proceedings. 2005 International Conference on Wireless Communications, Networking and Mobile Computing, 2005..

[13]  Jian Sun,et al.  The UMTS Turbo Code and an Efficient Decoder Implementation Suitable for Software-Defined Radios , 2001, Int. J. Wirel. Inf. Networks.

[14]  Joseph R. Cavallaro,et al.  Implementation of a 3GPP LTE turbo decoder accelerator on GPU , 2010, 2010 IEEE Workshop On Signal Processing Systems.

[15]  Robert G. Maunder,et al.  Implementation of a Fully-Parallel Turbo Decoder on a General-Purpose Graphics Processing Unit , 2016, IEEE Access.

[16]  John E. Stone,et al.  OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.

[17]  Yong Dou,et al.  CPU–GPU hybrid parallel strategy for cosmological simulations , 2014, Concurr. Comput. Pract. Exp..

[18]  Joseph R. Cavallaro,et al.  Implementation of a High Throughput 3GPP Turbo Decoder on GPU , 2011, J. Signal Process. Syst..

[19]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..

[20]  K. K. Loo,et al.  High performance parallelised 3GPP turbo decoder , 2003 .

[21]  Patrick Robertson,et al.  A comparison of optimal and sub-optimal MAP decoding algorithms operating in the log domain , 1995, Proceedings IEEE International Conference on Communications ICC '95.