Improving Energy Efficiency in FPGA Through Judicious Mapping of Computation to Embedded Memory Blocks

Field-programmable gate arrays (FPGAs) are being increasingly used as a preferred prototyping and accelerator platform for diverse application domains, such as digital signal processing (DSP), security, and real-time multimedia processing. However, mapping of these applications to FPGA typically suffers from poor energy efficiency because of high energy overhead of programmable interconnects (PI) in FPGA devices. This paper presents an energy-efficient heterogenous application mapping framework in FPGA, where the conventional application mappings to logic and DSP blocks (for DSP-enhanced FPGA devices) are combined with judicious mapping of specific computations to embedded memory blocks. A complete mapping methodology including functional decomposition, fusion, and optimal packing of operations is proposed and efficiently used to reduce the large energy overhead of PIs. Effectiveness of the proposed methodology is verified for a set of common applications using a commercial FPGA system. Experimental results show that the proposed heterogenous mapping approach achieves significant energy improvement for different input bit-widths (e.g., more than 35% of energy savings with 8 bit or smaller bit inputs compared to the corresponding mapping in configurable logic blocks). For further reduction of energy, we propose an energy/accuracy tradeoff approach, where the input operand bit-width is dynamically truncated to reduce memory area and energy at the expense of modest degradation in output-accuracy. We show that using a preferential truncation method, up to 88.6% energy savings can be achieved in a 32-tap finite impulse response filter with modest impact on the filter performance.

[1]  Vince Ridley,et al.  Embedded Intel ® Solutions , 2012 .

[2]  Ji-Woong Choi,et al.  A Reconfigurable FIR Filter Architecture to Trade Off Filter Performance for Dynamic Power Consumption , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[3]  Kaushik Roy,et al.  IMPACT: IMPrecise adders for low-power approximate computing , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.

[4]  Hillel J. Chiel,et al.  Ultra-Low-Power and Robust Digital-Signal-Processing Hardware for Implantable Neural Interface Microsystems , 2011, IEEE Transactions on Biomedical Circuits and Systems.

[5]  Swarup Bhunia,et al.  VaROT: Methodology for Variation-Tolerant DSP Hardware Design Using Post-Silicon Truncation of Operand Width , 2011, 2011 24th Internatioal Conference on VLSI Design.

[6]  Javier Valls-Coquillat,et al.  FPGA-implementation of atan(Y/X) based on logarithmic transformation and LUT-based techniques , 2010, J. Syst. Archit..

[7]  Claudio Brunelli,et al.  Approximating sine functions using variable-precision Taylor polynomials , 2009, 2009 IEEE Workshop on Signal Processing Systems.

[8]  Matt Klein,et al.  Power Consumption at 40 and 45 Nm , 2009 .

[9]  Wayne Luk,et al.  An Overview of Low-Power Techniques for Field-Programmable Gate Arrays , 2008, 2008 NASA/ESA Conference on Adaptive Hardware and Systems.

[10]  Wayne Luk,et al.  Power-Aware and Branch-Aware Word-Length Optimization , 2008, 2008 16th International Symposium on Field-Programmable Custom Computing Machines.

[11]  Kaushik Roy,et al.  A process variation aware low power synthesis methodology for fixed-point FIR filters , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[12]  Russell Tessier,et al.  Power-Efficient RAM Mapping Algorithms for FPGA Embedded Memory Blocks , 2007, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[13]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[14]  Steven J. E. Wilton,et al.  Power Implications of Implementing Logic Using FPGA Embedded Memory Arrays , 2006, 2006 International Conference on Field Programmable Logic and Applications.

[15]  Wayne Luk,et al.  Dynamic voltage scaling for commercial FPGAs , 2005, Proceedings. 2005 IEEE International Conference on Field-Programmable Technology, 2005..

[16]  Tim Good,et al.  AES on FPGA from the Fastest to the Smallest , 2005, CHES.

[17]  Arnaud Tisserand,et al.  Multipartite table methods , 2005, IEEE Transactions on Computers.

[18]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[19]  Jason Cong,et al.  Architecture evaluation for power-efficient FPGAs , 2003, FPGA '03.

[20]  Viktor K. Prasanna,et al.  Energy-efficient signal processing using FPGAs , 2003, FPGA '03.

[21]  Anantha Chandrakasan,et al.  Wiring requirement and three-dimensional integration technology for field programmable gate arrays , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[22]  Walter B. Ligon,et al.  Implementation and analysis of numerical components for reconfigurable computing , 1999, 1999 IEEE Aerospace Conference. Proceedings (Cat. No.99TH8403).

[23]  Jason Cong,et al.  Technology mapping for FPGAs with embedded memory blocks , 1998, FPGA '98.

[24]  Steven J. E. Wilton,et al.  SMAP: heterogeneous technology mapping for area reduction in FPGAs with embedded memory arrays , 1998, FPGA '98.

[25]  Achieving Low Power in 65-nm Cyclone III FPGAs , 1998 .

[26]  Debashis Bhattacharya,et al.  Algorithms for low power and high speed fir filter realization using differential coefficients , 1997 .

[27]  Weng-Fai Wong,et al.  Fast Evaluation of the Elementary Functions in Single Precision , 1995, IEEE Trans. Computers.

[28]  William H. Press,et al.  Numerical Recipes: FORTRAN , 1988 .