kNN-CAM: A k-Nearest Neighbors-based Configurable Approximate Floating Point Multiplier

In many real computations such as arithmetic operations in hidden layers of a neural network, some amounts of inaccuracies can be tolerated without degrading the final results (e.g., maintaining the same level of accuracy for image classification). This paper presents design of kNN-CAM, a k-Nearest Neighbors (kNN)-based Configurable Approximate floating point Multiplier. kNN-CAM utilizes approximate computing opportunities to deliver significant area and energy savings. A kNN engine is trained on a sufficiently large set of input data to learn the quantity of bit truncation that can be performed in each floating point input with the goal of minimizing energy and area. Next, this trained engine is used to predict the level of approximation for unseen data. Experimental results show that kNN-CAM provides about 67% area saving and 19% speedup while losing only 4.86% accuracy when compared to a 100% accurate multiplier. Furthermore, the application of kNN-CAM in implementation of a handwritten digit recognition provides 47.2% area saving while the accuracy is dropped by only 0.3%.

[1]  Walter Daelemans,et al.  Memory-Based Language Processing , 2009, Studies in natural language processing.

[2]  Shahin Nazarian,et al.  Approximate Logic Synthesis: A Reinforcement Learning-Based Technology Mapping Approach , 2019, 20th International Symposium on Quality Electronic Design (ISQED).

[3]  Sparsh Mittal,et al.  A Survey of Techniques for Approximate Computing , 2016, ACM Comput. Surv..

[4]  Tajana Simunic,et al.  MASC: Ultra-low energy multiple-access single-charge TCAM for approximate computing , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[5]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[6]  A. Enis Çetin,et al.  A multiplication-free framework for signal processing and applications in biomedical image analysis , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Lukás Sekanina,et al.  Evolutionary design of approximate multipliers under different error metrics , 2014, 17th International Symposium on Design and Diagnostics of Electronic Circuits & Systems.

[8]  Sherief Reda,et al.  A low-power dynamic divider for approximate applications , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[9]  Christian Enz,et al.  Approximate 32-bit floating-point unit design with 53% power-area product reduction , 2016, ESSCIRC Conference 2016: 42nd European Solid-State Circuits Conference.

[10]  Fabrizio Lombardi,et al.  A low-power, high-performance approximate multiplier with configurable partial error recovery , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[11]  Tajana Simunic,et al.  CFPU: Configurable floating point multiplier for energy-efficient computing , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[12]  Vijayalakshmi Srinivasan,et al.  Approximate computing: Challenges and opportunities , 2016, 2016 IEEE International Conference on Rebooting Computing (ICRC).

[13]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[14]  Kiat Seng Yeo,et al.  Low-power high-speed multiplier for error-tolerant application , 2010, 2010 IEEE International Conference of Electron Devices and Solid-State Circuits (EDSSC).

[15]  Kaushik Roy,et al.  Scalable effort hardware design: Exploiting algorithmic resilience for energy efficiency , 2010, Design Automation Conference.

[16]  Wenping Wang,et al.  A Surface Approximation Method for Image and Video Correspondences , 2015, IEEE Transactions on Image Processing.

[17]  Sherief Reda,et al.  DRUM: A Dynamic Range Unbiased Multiplier for approximate applications , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[18]  Shahin Nazarian,et al.  Normalization and dropout for stochastic computing-based deep convolutional neural networks , 2019, Integr..

[19]  Massoud Pedram,et al.  Energy-efficient, low-latency realization of neural networks through boolean logic minimization , 2019, ASP-DAC.

[20]  Puneet Gupta,et al.  Trading Accuracy for Power with an Underdesigned Multiplier Architecture , 2011, 2011 24th Internatioal Conference on VLSI Design.