Automated High-Level Generation of Low-Power Approximate Computing Circuits

Numerous application domains (e.g., signal and image processing, computer graphics, computer vision, and machine learning) are inherently error tolerant, which can be exploited to produce approximate ASIC implementations with low power consumption at the expense of negligible or small reductions in application quality. A major challenge is the need for approximate and high-level design generation tools that can automatically work on arbitrary designs. In this article, we provide an expanded and improved treatment of our ABACUS methodology, which aims to automatically generate approximate designs directly from their behavioral register-transfer level (RTL) descriptions, enabling a wider range of possible approximations. ABACUS starts by creating an abstract syntax tree (AST) from the input behavioral RTL description of a circuit, and then applies variant operators to the AST to create acceptable approximate designs. The devised variant operators include data type simplifications, arithmetic operation approximations, arithmetic expressions transformations, variable-to-constant substitutions, and loop transformations. A design space exploration technique is devised to explore the space of possible variant approximate designs and to identify the designs along the Pareto frontier that represents the trade-off between accuracy and power consumption. In addition, ABACUS prioritizes generating approximate designs that, when synthesized, lead to circuits with simplified critical paths, which are exploited to realize complementary power savings through standard voltage scaling. We integrate ABACUS with a standard ASIC design flow, and evaluate it on four realistic benchmarks from three different domains—machine learning, signal processing, and computer vision. Our tool automatically generates many approximate design variants with large power savings, while maintaining good accuracy. We demonstrate the scalability of ABACUS by parallelizing the flow and use of recent standard synthesis tools. Compared to our previous efforts, the new ABACUS tool provides up to 20.5× speed-up in runtime, while able to generate approximate circuits that lead to additional power savings reaching up to 40 percent.

[1]  Caro Lucas,et al.  Bio-Inspired Imprecise Computational Blocks for Efficient VLSI Implementation of Soft-Computing Applications , 2010, IEEE Transactions on Circuits and Systems I: Regular Papers.

[2]  Kartik Mohanram,et al.  Approximate logic circuits for low overhead, non-intrusive concurrent error detection , 2008, 2008 Design, Automation and Test in Europe.

[3]  Kenneth B. Kent,et al.  Odin II - An Open-Source Verilog HDL Synthesis Tool for CAD Research , 2010, 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines.

[4]  Andreas Gerstlauer,et al.  Approximate logic synthesis under general error magnitude and frequency constraints , 2013, 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[5]  Kaushik Roy,et al.  Low-Power Digital Signal Processing Using Approximate Adders , 2013, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[6]  Shih-Lien Lu Speeding Up Processing with Approximation Circuits , 2004, Computer.

[7]  Pradeep Dubey,et al.  Convergence of Recognition, Mining, and Synthesis Workloads and Its Implications , 2008, Proceedings of the IEEE.

[8]  Muhammad Shafique,et al.  enBudget: A Run-Time Adaptive Predictive Energy-Budgeting scheme for energy-aware Motion Estimation in H.264/MPEG-4 AVC video encoder , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[9]  Krishna V. Palem,et al.  Probabilistic arithmetic and energy efficient embedded signal processing , 2006, CASES '06.

[10]  Kaushik Roy,et al.  IMPACT: IMPrecise adders for low-power approximate computing , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.

[11]  Wei Luo,et al.  Joint precision optimization and high level synthesis for approximate computing , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[12]  Sherief Reda,et al.  ABACUS: A technique for automated behavioral synthesis of approximate computing circuits , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[13]  John Lach,et al.  Exploring the fidelity-efficiency design space using imprecise arithmetic , 2011, 16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011).

[14]  Braden J. Phillips,et al.  Arithmetic Data Value Speculation , 2005, Asia-Pacific Computer Systems Architecture Conference.

[15]  Quinn Jacobson,et al.  ERSA: error resilient system architecture for probabilistic applications , 2010, DATE 2010.

[16]  Kalyanmoy Deb,et al.  A Fast Elitist Non-dominated Sorting Genetic Algorithm for Multi-objective Optimisation: NSGA-II , 2000, PPSN.

[17]  Gian Carlo Cardarilli,et al.  Imprecise arithmetic for low power image processing , 2012, 2012 Conference Record of the Forty Sixth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).

[18]  Kaushik Roy,et al.  Quality programmable vector processors for approximate computing , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[19]  Sandeep K. Gupta,et al.  Approximate logic synthesis for error tolerant applications , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[20]  Lingamneni Avinash,et al.  Highly energy and performance efficient embedded computing through approximately correct arithmetic: a mathematical foundation and preliminary experimental validation , 2008, CASES '08.

[21]  Claire Le Goues,et al.  Automatic program repair with evolutionary computation , 2010, Commun. ACM.

[22]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[23]  Anand Raghunathan,et al.  Best-effort computing: Re-thinking parallel software and hardware , 2010, Design Automation Conference.

[24]  Anantha P. Chandrakasan,et al.  Minimizing power consumption in digital CMOS circuits , 1995, Proc. IEEE.

[25]  Puneet Gupta,et al.  Trading Accuracy for Power with an Underdesigned Multiplier Architecture , 2011, 2011 24th Internatioal Conference on VLSI Design.

[26]  Paolo Ienne,et al.  Variable Latency Speculative Addition: A New Paradigm for Arithmetic Circuit Design , 2008, 2008 Design, Automation and Test in Europe.

[27]  John Lach,et al.  A methodology for energy-quality tradeoff using imprecise hardware , 2012, DAC Design Automation Conference 2012.

[28]  Sherief Reda,et al.  DRUM: A Dynamic Range Unbiased Multiplier for approximate applications , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[29]  Luis Ceze,et al.  Architecture support for disciplined approximate programming , 2012, ASPLOS XVII.

[30]  Chris J. Bleakley,et al.  Real-time H.264 video encoding in software with fast mode decision and dynamic complexity control , 2010, TOMCCAP.

[31]  Kaushik Roy,et al.  SALSA: Systematic logic synthesis of approximate circuits , 2012, DAC Design Automation Conference 2012.