Accelerating the data shuffle operations for FFT algorithms on SIMD DSPs

FFT is a key kernel of OFDM in the 3GPP-LTE system. Many researchers employ SIMD DSPs to accelerate FFT algorithms by the feature that there is about 75% SIMD workloads in them. This paper makes a detailed analysis on how to accelerate FFT algorithms on SIMD DSPs. We propose an EXC instruction for SIMD DSPs. The EXC instruction can exchange the specified elements between two vector registers in one cycle. It can achieve performance benefits ranging from 1.18× to 1.37× and reduce the dynamic code size by up to 15% compared with the vhalfup and vhalfdn instructions which are implemented in VIRAM processor. Moreover, two useful suggestions are presented in this paper for designing the architecture oriented to the 3G/4G wireless communication systems.