CNN Inference Using a Preprocessing Precision Controller and Approximate Multipliers With Various Precisions

This article proposes boosting the multiplication performance for convolutional neural network (CNN) inference using a precision prediction preprocessor which controls various precision approximate multipliers. Previously, utilizing approximate multipliers for CNN inference was proposed to enhance the power, speed, and area at a cost of a tolerable drop in the accuracy. Low precision approximate multipliers can achieve massive performance gains; however, utilizing them is not feasible due to the large accuracy loss they cause. To maximize the multiplication performance gains while minimizing the accuracy loss, this article proposes using a tiny two-class precision controller to utilize low and high precision approximate multipliers hybridly. The performance benefits for the proposed concept are presented for multi-core multi-precision architectures and single-core reconfigurable architectures. Additionally, a design for a merged reconfigurable approximate multiplier with two precisions is proposed for utilization in single-core architectures. For performance comparison, several segments-based approximate multipliers with different precisions were synthesized using CMOS 15nm technology. For accuracy evaluation, the concept was simulated on VGG19, Xception, and DenseNet201 using the ImageNetV2 dataset. This article will demonstrate that the proposed concept can achieve significant performance gains with a minimal accuracy loss when compared to designs that utilize exact multipliers or single-precision approximate multipliers.

[1]  Lin Yang,et al.  Ultra Power-Efficient CNN Domain Specific Accelerator with 9.3TOPS/Watt for Mobile and Embedded Applications , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Luca Benini,et al.  Origami: A Convolutional Network Accelerator , 2015, ACM Great Lakes Symposium on VLSI.

[4]  Dimitrios Soudris,et al.  Design-Efficient Approximate Multiplication Circuits Through Partial Product Perforation , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[5]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[6]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[7]  Jingyu Wang,et al.  STICKER: An Energy-Efficient Multi-Sparsity Compatible Accelerator for Convolutional Neural Networks in 65-nm CMOS , 2020, IEEE Journal of Solid-State Circuits.

[8]  Kiamal Z. Pekmestzi,et al.  Cooperative Arithmetic-Aware Approximation Techniques for Energy-Efficient Multipliers , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[9]  Bruce F. Cockburn,et al.  Improving the Accuracy and Hardware Efficiency of Neural Networks Using Approximate Multipliers , 2020, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[10]  Qiang Xu,et al.  Approximate Computing: A Survey , 2016, IEEE Design & Test.

[11]  Mehdi Kamal,et al.  RoBA Multiplier: A Rounding-Based Approximate Multiplier for High-Speed yet Energy-Efficient Digital Signal Processing , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[12]  Xueliang Zhang,et al.  Design of 16-bit fixed-point CNN coprocessor based on FPGA , 2018, 2018 IEEE 23rd International Conference on Digital Signal Processing (DSP).

[13]  Xuegong Zhou,et al.  A high performance FPGA-based accelerator for large-scale convolutional neural networks , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[14]  Shihui Yin,et al.  A 2.6 TOPS/W 16-Bit Fixed-Point Convolutional Neural Network Learning Processor in 65-nm CMOS , 2020, IEEE Solid-State Circuits Letters.

[15]  Abien Fred Agarap Deep Learning using Rectified Linear Units (ReLU) , 2018, ArXiv.

[16]  Yu Wang,et al.  Towards Real-Time Object Detection on Embedded Systems , 2018, IEEE Transactions on Emerging Topics in Computing.

[17]  Iraklis Anagnostopoulos,et al.  Weight-Oriented Approximation for Energy-Efficient Neural Network Inference Accelerators , 2020, IEEE Transactions on Circuits and Systems I: Regular Papers.

[18]  Fabrizio Lombardi,et al.  Design of Approximate Radix-4 Booth Multipliers for Error-Tolerant Computing , 2017, IEEE Transactions on Computers.

[19]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[20]  Mehdi Kamal,et al.  TOSAM: An Energy-Efficient Truncation- and Rounding-Based Scalable Approximate Multiplier , 2019, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[21]  Jie Han,et al.  Approximate computing: An emerging paradigm for energy-efficient design , 2013, 2013 18th IEEE European Test Symposium (ETS).

[22]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Kamal El-Sankary,et al.  Impact of Approximate Multipliers on VGG Deep Learning Network , 2018, IEEE Access.

[24]  Zhenyu Liu,et al.  High-Performance FPGA-Based CNN Accelerator With Block-Floating-Point Arithmetic , 2019, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[25]  Taejoon Park,et al.  Energy-Efficient Approximate Multiplication for Digital Signal Processing and Classification Applications , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[26]  Kaushik Roy,et al.  Design of power-efficient approximate multipliers for approximate artificial neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[27]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[28]  Seok-Bum Ko,et al.  Design of Power and Area Efficient Approximate Multipliers , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[29]  Sparsh Mittal,et al.  A Survey of Techniques for Approximate Computing , 2016, ACM Comput. Surv..

[30]  Kaushik Roy,et al.  Analysis and characterization of inherent application resilience for approximate computing , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[31]  Peng Zhang,et al.  Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[32]  Jason Gu,et al.  Deep Learning Training with Simulated Approximate Multipliers , 2019, 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[33]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Sherief Reda,et al.  DRUM: A Dynamic Range Unbiased Multiplier for approximate applications , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[35]  Sachin S. Talathi,et al.  Fixed Point Quantization of Deep Convolutional Networks , 2015, ICML.

[36]  Benjamin Recht,et al.  Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.

[37]  Taejoon Park,et al.  SiMul: An Algorithm-Driven Approximate Multiplier Design for Machine Learning , 2018, IEEE Micro.