Design-Efficient Approximate Multiplication Circuits Through Partial Product Perforation

Approximate computing has received significant attention as a promising strategy to decrease power consumption of inherently error tolerant applications. In this paper, we focus on hardware-level approximation by introducing the partial product perforation technique for designing approximate multiplication circuits. We prove in a mathematically rigorous manner that in partial product perforation, the imposed errors are bounded and predictable, depending only on the input distribution. Through extensive experimental evaluation, we apply the partial product perforation method on different multiplier architectures and expose the optimal architecture-perforation configuration pairs for different error constraints. We show that, compared with the respective exact design, the partial product perforation delivers reductions of up to 50% in power consumption, 45% in area, and 35% in critical delay. In addition, the product perforation method is compared with the state-of-the-art approximation techniques, i.e., truncation, voltage overscaling, and logic approximation, showing that it outperforms them in terms of power dissipation and error.

[1]  D. Zuras,et al.  Balanced delay trees and combinatorial division in VLSI , 1986 .

[2]  Yang Liu,et al.  Computation Error Analysis in Digital Signal Processing Systems With Overscaled Supply Voltage , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[3]  Sherief Reda,et al.  DRUM: A Dynamic Range Unbiased Multiplier for approximate applications , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[4]  E. J. King,et al.  Data-dependent truncation scheme for parallel multipliers , 1997, Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers (Cat. No.97CB36136).

[5]  Lingamneni Avinash,et al.  Synthesizing Parsimonious Inexact Circuits through Probabilistic Design Techniques , 2013, TECS.

[6]  E. Swartzlander,et al.  Truncated multiplication with correction constant [for DSP] , 1993, Proceedings of IEEE Workshop on VLSI Signal Processing.

[7]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[8]  Taejoon Park,et al.  Energy-Efficient Approximate Multiplication for Digital Signal Processing and Classification Applications , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[9]  Kaushik Roy,et al.  MACACO: Modeling and analysis of circuits for approximate computing , 2011, 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[10]  Fabrizio Lombardi,et al.  New Metrics for the Reliability of Approximate and Probabilistic Adders , 2013, IEEE Transactions on Computers.

[11]  Kaushik Roy,et al.  Analysis and characterization of inherent application resilience for approximate computing , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[12]  Behrooz Parhami,et al.  Computer arithmetic - algorithms and hardware designs , 1999 .

[13]  Fabrizio Lombardi,et al.  Design and Analysis of Approximate Compressors for Multiplication , 2015, IEEE Transactions on Computers.

[14]  Zhi-Hui Kong,et al.  Design of Low-Power High-Speed Truncation-Error-Tolerant Adder and Its Application in Digital Signal Processing , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[15]  Fabrizio Lombardi,et al.  A low-power, high-performance approximate multiplier with configurable partial error recovery , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[16]  Puneet Gupta,et al.  Trading Accuracy for Power with an Underdesigned Multiplier Architecture , 2011, 2011 24th Internatioal Conference on VLSI Design.

[17]  Paolo Ienne,et al.  Variable Latency Speculative Addition: A New Paradigm for Arithmetic Circuit Design , 2008, 2008 Design, Automation and Test in Europe.

[18]  Kaushik Roy,et al.  IMPACT: IMPrecise adders for low-power approximate computing , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.

[19]  Henry Hoffmann,et al.  Managing performance vs. accuracy trade-offs with loop perforation , 2011, ESEC/FSE '11.

[20]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[21]  Anand Raghunathan,et al.  Best-effort computing: Re-thinking parallel software and hardware , 2010, Design Automation Conference.

[22]  Kiamal Z. Pekmestzi,et al.  Approximate Multiplier Architectures Through Partial Product Perforation: Power-Area Tradeoffs Analysis , 2015, ACM Great Lakes Symposium on VLSI.

[23]  Bijoy Antony Jose,et al.  Delay Optimized Redundant Binary Adders , 2006, 2006 13th IEEE International Conference on Electronics, Circuits and Systems.

[24]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Florent de Dinechin,et al.  Multipliers for floating-point double precision and beyond on FPGAs , 2011, CARN.