Implementation of LTE system on an SDR platform using CUDA and UHD

In this paper, we present an implementation of a long term evolution (LTE) system on a software defined radio (SDR) platform using a conventional personal computer that adopts a graphic processing unit (GPU) and a universal software radio peripheral2 (USRP2) with a URSP hardware driver (UHD) to implement an SDR software modem and a radio frequency transceiver, respectively. The central processing unit executes C++ control code that can access the USRP2 via the UHD. We have adopted the Ettus Research UHD due to its high degree of flexibility in the design of the transceiver chain. By taking advantage of this benefit, a simple cognitive radio engine has been implemented using libraries provided by the UHD. We have implemented the software modem on a GPU that is suitable for parallel computing due to its powerful arithmetic and logic units. A parallel programming method is proposed that exploits the single instruction multiple data architecture of the GPU. We focus on the implementation of the Turbo decoder due to its high computational requirements and difficulty in parallelizing the algorithm. The implemented system is analyzed primarily in terms of computation time using the compute unified device architecture profiler. From our experimental tests using the implemented system, we have measured the total processing time for a single frame of both transmit and receive LTE data. We find that it takes 5.00 and 8.58 ms for transmit and receive, respectively. This confirms that the implemented system is capable of real-time processing of all the baseband signal processing algorithms required for LTE systems.

[1]  Indrajit Chakrabarti,et al.  An improved low-power high-throughput log-MAP turbo decoder , 2010, IEEE Transactions on Consumer Electronics.

[2]  Seungwon Choi,et al.  Implementation of an SDR platform using GPU and its application to a 2 × 2 MIMO WiMAX system , 2011 .

[3]  Jah-Ming Hsu,et al.  A parallel decoding scheme for turbo codes , 1998, ISCAS '98. Proceedings of the 1998 IEEE International Symposium on Circuits and Systems (Cat. No.98CH36187).

[4]  Kevin Skadron,et al.  A performance study of general-purpose applications on graphics processors using CUDA , 2008, J. Parallel Distributed Comput..

[5]  J. Kulpa,et al.  Time-frequency analysis using NVIDIA compute unified device architecture (CUDA) , 2009, Symposium on Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments (WILGA).

[6]  Seungwon Choi,et al.  Implementation of an SDR system using an MPI-based GPU cluster for WiMAX and LTE , 2012 .

[7]  A. Glavieux,et al.  Near Shannon limit error-correcting coding and decoding: Turbo-codes. 1 , 1993, Proceedings of ICC '93 - IEEE International Conference on Communications.

[8]  Joseph R. Cavallaro,et al.  Implementation of a High Throughput 3GPP Turbo Decoder on GPU , 2011, J. Signal Process. Syst..

[9]  Yeheskel Bar-Ness,et al.  A parallel MAP algorithm for low latency turbo decoding , 2002, IEEE Communications Letters.

[10]  Seungwon Choi,et al.  Implementation of an SDR system using graphics processing unit , 2010, IEEE Communications Magazine.

[11]  Joseph R. Cavallaro,et al.  Efficient hardware implementation of a highly-parallel 3GPP LTE/LTE-advance turbo decoder , 2011, Integr..

[12]  Wen-mei W. Hwu,et al.  Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.

[13]  Majid Sarrafzadeh,et al.  A memory optimization technique for software-managed scratchpad memory in GPUs , 2009, 2009 IEEE 7th Symposium on Application Specific Processors.

[14]  A. Grimshaw,et al.  High Performance and Scalable Radix Sorting: a Case Study of Implementing Dynamic Parallelism for GPU Computing , 2011, Parallel Process. Lett..

[15]  Qiuting Huang,et al.  Design and Implementation of a Parallel Turbo-Decoder ASIC for 3GPP-LTE , 2011, IEEE Journal of Solid-State Circuits.