Low-power implementation of an OFDM based channel receiver in real-time using a low-end media processor

The implementation of advanced channel receivers using low-end multimedia instruction set processors is a productive, flexible and cost effective alternative to custom hardware. The stringent real-time and low-power requirements become attainable on condition that for these applications the impact of the data transfer and storage related issues is first drastically reduced. This paper illustrates the implementation of a real-time and low-power OFDM based channel receiver on a TriMedia TM1300 processor. This is the result after applying our Data Transfer and Storage Exploration methodology. In particular we focus on the exploration of data formating alternatives of background memory for efficient sub-word level acceleration. The outcome of our approach is an optimized source code description of the channel receiver which optimally exploits the existing memory hierarchy while making use of the available SIMD instructions in a cost effective manner. Following this approach we have achieved more than an order of magnitude reduction in energy consumed in the memory hierarchy while executing in real-time. Moreover, the slack achieved in execution time makes it possible to lower the frequency, and thus the of the CPU core towards the minimum recommended by the processor’s specification. This allows an extra 36% energy reduction in the CPU core.

[1]  Hugo De Man,et al.  High-level address optimization and synthesis techniques for data-transfer-intensive applications , 1998, IEEE Trans. Very Large Scale Integr. Syst..

[2]  Mateo Valero,et al.  An Evaluation of Different DLP Alternatives for the Embedded Media Domain , 1999 .

[3]  F. Catthoor,et al.  Interaction between sub-word parallelism exploitation and low power code transformations for VLIW multi-media processors , 1999, Proceedings IEEE Alessandro Volta Memorial Workshop on Low-Power Design.

[4]  Nikil D. Dutt,et al.  Memory data organization for improved cache performance in embedded processor applications , 1997, TODE.

[5]  Hugo De Man,et al.  A loop transformation approach for combined parallelization and data transfer and storage optimization , 2000, PDPTA.

[6]  Wolfgang Hoeg,et al.  Digital audio broadcasting : principles and applications , 2001 .

[7]  Nathalie Drach-Temam,et al.  Memory Bandwidth: The True Bottleneck of SIMD Multimedia Performance on a Superscalar Processor , 2001, Euro-Par.

[8]  Francky Catthoor,et al.  System-level data format exploration for dynamically allocated data structures , 2000, Proceedings 37th Design Automation Conference.

[9]  Rainer Leupers,et al.  Graph-based code selection techniques for embedded processors , 2000, TODE.

[10]  Rainer Leupers,et al.  Optimized address assignment for DSPs with SIMD memory accesses , 2001, ASP-DAC '01.

[11]  Josep Llosa,et al.  Increasing memory bandwidth with wide buses: compiler, hardware and performance trade-offs , 1997, ICS '97.

[12]  D. Verkest,et al.  Systematic high-level address code transformations for piece-wise linear indexing: illustration on a medical imaging algorithm , 2000, 2000 IEEE Workshop on SiGNAL PROCESSING SYSTEMS. SiPS 2000. Design and Implementation (Cat. No.00TH8528).

[13]  Erik Brockmeyer,et al.  Data Access and Storage Management for Embedded Programmable Processors , 2002, Springer US.

[14]  Mateo Valero Cortés,et al.  An evaluation of different DLP alternatives for the embedded media domain , 1999 .

[15]  J.A. Huisken,et al.  A power-efficient single-chip OFDM demodulator and channel decoder for multimedia broadcasting , 1998, 1998 IEEE International Solid-State Circuits Conference. Digest of Technical Papers, ISSCC. First Edition (Cat. No.98CH36156).

[16]  Jack W. Davidson,et al.  Memory access coalescing: a technique for eliminating redundant memory accesses , 1994, PLDI '94.