Language-Extension-Based Vectorizing Compiling Scheme on SDR-DSP

In this paper we propose a Language-Extension-based Vectorizing Compiling Scheme (LEVCS) for a newly developed DSP. The DSP is mainly designed for Software-Defined Radio (SDR) and is called SDR-DSP. The SDR-DSP architecture mixes the styles of VLIW (Very Long Instruction Word) and SIMD (Single Instruction Multiple Data). To explore the potential of SDR-DSP and achieve high performance, vectorization is one of the must equipped critical methods. Because auto-vectorization techniques cannot satisfy the requirements of the typical application, LEVCS is used to direct the vectorization. The C-extending programming language used in LEVCS is called SDR-DSP-C. LEVCS uses flexible data reorganization to make vectorization on SDR-DSP more efficient. We use LEVCS to vectorize five benchmark kernels: Fast Fourier Transform (FFT), Finite Impulse Responsefilter (FIR) and Infinite Impulse Response filter (IIR), Dot product implementation (Dotprod), Sum of vectors (vecsum). Experiment results show that LEVCS is functional correct and can achieve 2.883–8.074 speedups comparing to TI-DSPs.

[1]  Christopher Batten,et al.  The vector-thread architecture , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[2]  Walter Tuttlebee,et al.  Software defined radio : enabling technologies , 2002 .

[3]  Fumiyuki Adachi,et al.  The Overview of the New Generation Mobile Communication System and the Role of Software Defined Radio Technology , 2003 .

[4]  Lars Wehmeyer,et al.  Energy aware compilation for DSPs with SIMD instructions , 2002, LCTES/SCOPES '02.

[5]  Tor M. Aamodt,et al.  Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware , 2009, TACO.

[6]  Yunho Jung,et al.  New efficient FFT algorithm and pipeline implementation results for OFDM/DMT applications , 2003, IEEE Trans. Consumer Electron..

[7]  Kyoung-Rok Cho,et al.  A DSP-Based Reconfigurable SDR Platform for 3G Systems , 2005, IEICE Trans. Commun..

[8]  Norman P. Jouppi,et al.  Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[9]  David A. Padua,et al.  An Evaluation of Vectorizing Compilers , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[10]  W. Dally,et al.  Efficient conditional operations for data-parallel architectures , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[11]  Philipp Slusallek,et al.  RPU: a programmable ray processing unit for realtime ray tracing , 2005, ACM Trans. Graph..

[12]  Shuming Chen,et al.  Instruction Shuffle: Achieving MIMD-like Performance on SIMD Architectures , 2012, IEEE Computer Architecture Letters.

[13]  Satoshi Goto,et al.  A 98 GMACs/W 32-Core Vector Processor in 65 nm CMOS , 2011, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[14]  Ahmed Sameh,et al.  The Illiac IV system , 1972 .

[15]  Tor M. Aamodt,et al.  Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[16]  Yoshinori Takeuchi,et al.  Generation of Pack Instruction Sequence for Media Processors Using Multi-Valued Decision Diagram , 2007, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..