Exploiting finite precision information to guide data-flow mapping

Advanced handheld applications are demanding for implementations of higher energy efficiency and higher performance. In typical implementations, the finite precision information is only known after fixed-point refinement, once the data-flow has been frozen. Instead, in this paper we suggest the propagation of finite precision information to drive data-flow transformations in order to achieve a higher mapping efficiency. Then, provided a flexible architecture with low run-time switching overhead, the data-flow under execution can opportunistically be tuned to provide the instantaneous computational accuracy required by the application. Thereby, the average number of operations and the precision of those is minimized. This principle is demonstrated with the implementation of the 128-point FFT present in a WLAN receiver. Compared to a conventional implementation, a reduction of 49% to 65% of the number of cycles can be achieved depending on conditions external to the receiver.

[1]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[2]  Krste Asanovic,et al.  A speculative control scheme for an energy-efficient banked register file , 2005, IEEE Transactions on Computers.

[3]  Naresh R. Shanbhag Algorithms Transformation Techniques for Low-Power Wireless VLSI Systems Design , 1998, Int. J. Wirel. Inf. Networks.

[4]  S. Cherry,et al.  Edholm's law of bandwidth , 2004, IEEE Spectrum.

[5]  F. Catthoor,et al.  Applications-specific microcoded architectures for efficient fixed-rate FFT , 1989, IEEE International Symposium on Circuits and Systems,.

[6]  Franz Franchetti,et al.  SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.

[7]  Behrooz Parhami,et al.  Computer arithmetic - algorithms and hardware designs , 1999 .

[8]  Romuald Rocher,et al.  Fixed-Point Configurable Hardware Components , 2006, EURASIP J. Embed. Syst..

[9]  Yoshikazu Miyanaga,et al.  Tunable Wordlength Architecture for a Low Power Wireless OFDM Demodulator , 2006, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[10]  Diederik Verkest,et al.  Distributed Loop Controller for Multithreading in , 2009 .

[11]  Wayne Luk,et al.  Unifying bit-width optimisation for fixed-point and floating-point designs , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[12]  D. Verkest,et al.  Very Wide Register: An Asymmetric Register File Organization for Low Power Embedded Processors , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[13]  Alok N. Choudhary,et al.  Precision and error analysis of MATLAB applications during automated hardware synthesis for FPGAs , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[14]  Miodrag Potkonjak,et al.  Optimizing power using transformations , 1995, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..