AddNet: Deep Neural Networks Using FPGA-Optimized Multipliers

Low-precision arithmetic operations to accelerate deep-learning applications on field-programmable gate arrays (FPGAs) have been studied extensively, because they offer the potential to save silicon area or increase throughput. However, these benefits come at the cost of a decrease in accuracy. In this article, we demonstrate that reconfigurable constant coefficient multipliers (RCCMs) offer a better alternative for saving the silicon area than utilizing low-precision arithmetic. RCCMs multiply input values by a restricted choice of coefficients using only adders, subtractors, bit shifts, and multiplexers (MUXes), meaning that they can be heavily optimized for FPGAs. We propose a family of RCCMs tailored to FPGA logic elements to ensure their efficient utilization. To minimize information loss from quantization, we then develop novel training techniques that map the possible coefficient representations of the RCCMs to neural network weight parameter distributions. This enables the usage of the RCCMs in hardware, while maintaining high accuracy. We demonstrate the benefits of these techniques using AlexNet, ResNet-18, and ResNet-50 networks. The resulting implementations achieve up to 50% resource savings over traditional 8-bit quantized networks, translating to significant speedups and power savings. Our RCCM with the lowest resource requirements exceeds 6-bit fixed point accuracy, while all other implementations with RCCMs achieve at least similar accuracy to an 8-bit uniformly quantized design, while achieving significant resource savings.

[1]  Farinaz Koushanfar,et al.  ReBNet: Residual Binarized Neural Network , 2017, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[2]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[3]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[4]  Eriko Nurvitadhi,et al.  Accelerating Deep Convolutional Networks using low-precision and sparsity , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[6]  Eriko Nurvitadhi,et al.  A Customizable Matrix Multiplication Framework for the Intel HARPv2 Xeon+FPGA Platform: A Deep Learning Case Study , 2018, FPGA.

[7]  Christos-Savvas Bouganis,et al.  Toolflows for Mapping Convolutional Neural Networks on FPGAs , 2018, ACM Comput. Surv..

[8]  Peter Zipf,et al.  An Efficient Softcore Multiplier Architecture for Xilinx FPGAs , 2015, 2015 IEEE 22nd Symposium on Computer Arithmetic.

[9]  Andrew G. Dempster,et al.  Design guidelines for reconfigurable multiplier blocks , 2003, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..

[10]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[11]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[12]  Kurt Keutzer,et al.  Invited: Co-Design of Deep Neural Nets and Neural Net Accelerators for Embedded Vision Applications , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[13]  Peter Zipf,et al.  Reconfigurable Constant Multiplication for FPGAs , 2017, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[14]  Peter Zipf,et al.  Optimal Shift Reassignment in Reconfigurable Constant Multiplication Circuits , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[15]  Luciano Lavagno,et al.  Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs , 2018, FPGA.

[16]  Song Han,et al.  ESE: Efficient Speech Recognition Engine with Compressed LSTM on FPGA , 2016, ArXiv.

[17]  Mohamed S. Abdelfattah,et al.  DLA: Compiler and FPGA Overlay for Neural Network Inference Acceleration , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[18]  Philip Heng Wai Leong,et al.  FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.

[19]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[20]  Peter Zipf,et al.  Reconfigurable Convolutional Kernels for Neural Networks on FPGAs , 2019, FPGA.

[21]  Peter Zipf,et al.  Dynamically Reconfigurable Constant Multiplication on FPGAs , 2014, MBMV.

[22]  Daisuke Miyashita,et al.  Convolutional Neural Networks using Logarithmic Data Representation , 2016, ArXiv.

[23]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[24]  James C. Hoe,et al.  Time-Multiplexed Multiple-Constant Multiplication , 2007, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[25]  Maxime Pelcat,et al.  Accelerating CNN inference on FPGAs: A Survey , 2018, ArXiv.

[26]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[27]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[28]  Philip Heng Wai Leong,et al.  SYQ: Learning Symmetric Quantization for Efficient Deep Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[30]  Sachin S. Talathi,et al.  Fixed Point Quantization of Deep Convolutional Networks , 2015, ICML.

[31]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[32]  Yu Wang,et al.  A Survey of FPGA-Based Neural Network Accelerator , 2017, 1712.08934.

[33]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Martin Langhammer,et al.  High Density and Performance Multiplication for FPGA , 2018, 2018 IEEE 25th Symposium on Computer Arithmetic (ARITH).

[35]  Yun Liang,et al.  REQ-YOLO: A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs , 2019, FPGA.

[36]  Kurt Keutzer,et al.  Co-design of deep neural nets and neural net accelerators for embedded vision applications , 2019, IBM J. Res. Dev..

[37]  Jian Cheng,et al.  Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  E. George Walters Partial-product generation and addition for multiplication in FPGAs with 6-input LUTs , 2014, 2014 48th Asilomar Conference on Signals, Systems and Computers.

[39]  R. Hartley Subexpression sharing in filters using canonic signed digit multipliers , 1996 .