A Novel Method for the Approximation of Multiplierless Constant Matrix Vector Multiplication

Since human beings have limited perceptual abilities, in many digital signal processing (DSP) applications, e.g., image and video processing, the outputs do not need to be computed accurately. Instead, they can be approximated so that the area, delay, and/or power dissipation of the design can be reduced. This paper presents an approximation algorithm, called AURA, for the multiplierless design of the constant matrix vector multiplication (CMVM) which is a ubiquitous operation in DSP systems. AURA aims to tune the constants such that the resulting matrix leads to a CMVM design which requires the fewest adders/subtractors, satisfying the given error constraints. This paper also introduces its modified version, called AURA-DC, which can reduce the delay of the CMVM operation with a small increase in the number of adders/subtractors. Experimental results show that the proposed algorithms yield significant reductions in the number of adders/subtractors with respect to the original realizations without violating the error constraints, and consequently, lead to CMVM designs with less area, delay, and power dissipation. Moreover, they can generate alternative CMVM designs under different error constraints, enabling a designer to choose the one that fits best in an application.

[1]  John P. Hayes,et al.  Survey of Stochastic Computing , 2013, TECS.

[2]  Nicolas Boullis,et al.  Some optimizations of hardware multiplication by constant matrices , 2003, Proceedings 2003 16th IEEE Symposium on Computer Arithmetic.

[3]  Zhi-Hui Kong,et al.  Design of Low-Power High-Speed Truncation-Error-Tolerant Adder and Its Application in Digital Signal Processing , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[4]  Neri Merhav,et al.  Multiplication-free approximate algorithms for compressed-domain linear operations on images , 1999, IEEE Trans. Image Process..

[5]  O. Gustafsson,et al.  Low-complexity constant coefficient matrix multiplication using a minimum spanning tree approach , 2004, Proceedings of the 6th Nordic Signal Processing Symposium, 2004. NORSIG 2004..

[6]  Kaushik Roy,et al.  Low-Power Digital Signal Processing Using Approximate Adders , 2013, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[7]  Milos D. Ercegovac,et al.  Digital Arithmetic , 2003, Wiley Encyclopedia of Computer Science and Engineering.

[8]  Anantha P. Chandrakasan,et al.  Low-power digital filtering using approximate processing , 1996 .

[9]  Jie Han,et al.  Approximate computing: An emerging paradigm for energy-efficient design , 2013, 2013 18th IEEE European Test Symposium (ETS).

[10]  Levent Aksoy,et al.  Multiplierless Design of Linear DSP Transforms , 2011, VLSI-SoC.

[11]  Luca Benini,et al.  An approximate computing technique for reducing the complexity of a direct-solver for sparse linear systems in real-time video processing , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[12]  Lingamneni Avinash,et al.  Ten Years of Building Broken Chips: The Physics and Engineering of Inexact Computing , 2013, TECS.

[13]  Krishna V. Palem,et al.  Energy aware computing through probabilistic switching: a study of limits , 2005, IEEE Transactions on Computers.

[14]  Oscar Gustafsson,et al.  Lower Bounds for Constant Multiplication Problems , 2007, IEEE Transactions on Circuits and Systems II: Express Briefs.

[15]  Kaushik Roy,et al.  MACACO: Modeling and analysis of circuits for approximate computing , 2011, 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[16]  Kaushik Roy,et al.  SALSA: Systematic logic synthesis of approximate circuits , 2012, DAC Design Automation Conference 2012.

[17]  Levent Aksoy,et al.  Exact and Approximate Algorithms for the Filter Design Optimization Problem , 2015, IEEE Transactions on Signal Processing.

[18]  Levent Aksoy,et al.  Multiple tunable constant multiplications: Algorithms and applications , 2012, 2012 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[19]  O. Gustafsson,et al.  Low-complexity hybrid form FIR filters using matrix multiple constant multiplication , 2004, Conference Record of the Thirty-Eighth Asilomar Conference on Signals, Systems and Computers, 2004..

[20]  Puneet Gupta,et al.  Trading Accuracy for Power with an Underdesigned Multiplier Architecture , 2011, 2011 24th Internatioal Conference on VLSI Design.

[21]  Arda Yurdakul,et al.  Multiplierless Realization of Linear DSP Transforms by Using Common Two-Term Expressions , 1999, J. VLSI Signal Process..

[22]  Ryan Kastner,et al.  Xquasher: A tool for efficient computation of multiple linear expressions , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[23]  A. Hosangadi,et al.  Reducing hardware complexity of linear DSP systems by iteratively eliminating two-term common subexpressions , 2005, Proceedings of the ASP-DAC 2005. Asia and South Pacific Design Automation Conference, 2005..

[24]  A. Dempster,et al.  Common subexpression elimination algorithm for low-cost multiplierless implementation of matrix multipliers , 2004 .

[25]  Levent Aksoy,et al.  Optimization Algorithms for the Multiplierless Realization of Linear Transforms , 2012, ACM Trans. Design Autom. Electr. Syst..

[26]  Anantha Chandrakasan,et al.  Approximate Signal Processing , 1997, J. VLSI Signal Process..

[27]  Vasudev Bhaskaran,et al.  A fast approximate algorithm for scaling down digital images in the DCT domain , 1995, Proceedings., International Conference on Image Processing.