Residue-Net: Multiplication-free Neural Network by In-situ No-loss Migration to Residue Number Systems

Deep eural networks are widely deployed on embedded devices to solve a wide range of problems from edge-sensing to autonomous driving. The accuracy of these networks is usually proportional to their complexity. Quantization of model parameters (i.e., weights) and/or activations to alleviate the complexity of these networks while preserving accuracy is a popular powerful technique. Nonetheless, previous studies have shown that quantization level is limited as the accuracy of the network decreases afterward. We propose Residue-Net, a multiplication-free accelerator for neural networks that uses Residue Number System (RNS) to achieve substantial energy reduction. RNS breaks down the operations to several smaller operations that are simpler to implement. Moreover, Residue-Net replaces the copious of costly multiplications with non-complex, energy-efficient shift and add operations to further simplify the computational complexity of neural networks. To evaluate the efficiency of our proposed accelerator, we compared the performance of Residue-Net with a baseline FPGA implementation of four widely-used networks, viz., LeNet, AlexNet, VGG16, and ResNet-50. When delivering the same performance as the baseline, Residue-Net reduces the area and power (hence energy) respectively by 36% and 23%, on average with no accuracy loss. Leveraging the saved area to accelerate the quantized RNS network through parallelism, Residue-Net improves its throughput by 2.8× and energy by 2.7×.

[1]  G. Hua,et al.  LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks , 2018, ECCV.

[2]  Zhijian Liu,et al.  HAQ: Hardware-Aware Automated Quantization With Mixed Precision , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Tajana Simunic,et al.  F5-HD: Fast Flexible FPGA-based Framework for Refreshing Hyperdimensional Computing , 2019, FPGA.

[4]  Sungroh Yoon,et al.  Big/little deep neural network for ultra low power inference , 2015, 2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[5]  Marios C. Papaefthymiou,et al.  Rethinking Numerical Representations for Deep Neural Networks , 2017, ArXiv.

[6]  Diana Marculescu,et al.  LightNN: Filling the Gap between Conventional Deep Neural Networks and Binarized Networks , 2017, ACM Great Lakes Symposium on VLSI.

[7]  Hari Angepat,et al.  Serving DNNs in Real Time at Datacenter Scale with Project Brainwave , 2018, IEEE Micro.

[8]  Nikolay I. Chervyakov,et al.  Efficient implementation of modular multiplication by constants applied to RNS reverse converters , 2017, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).

[9]  Florent de Dinechin,et al.  Evaluating the Hardware Cost of the Posit Number System , 2019, 2019 29th International Conference on Field Programmable Logic and Applications (FPL).

[10]  V. Krasnobayev,et al.  A Method for Arithmetic Comparison of Data Represented in a Residue Number System , 2016 .

[11]  Hadi Esmaeilzadeh,et al.  Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network , 2017, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[12]  Tsutomu Sasao,et al.  A deep convolutional neural network based on nested residue number system , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).

[13]  Miriam Leeser,et al.  Behavioral Non-portability in Scientific Numeric Computing , 2015, Euro-Par.

[14]  William J. Dally,et al.  MAGNet: A Modular Accelerator Generator for Neural Networks , 2019, 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[15]  Mehdi Kamal,et al.  Res-DNN: A Residue Number System-Based DNN Accelerator Unit , 2020, IEEE Transactions on Circuits and Systems I: Regular Papers.

[16]  Joel Emer,et al.  Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.

[17]  Enrico Macii,et al.  Dynamic Bit-width Reconfiguration for Energy-Efficient Deep Learning Hardware , 2018, ISLPED.

[18]  Muhammad Shafique,et al.  An Updated Survey of Efficient Hardware Architectures for Accelerating Deep Convolutional Neural Networks , 2020, Future Internet.

[19]  Tsutomu Sasao,et al.  A High-speed Low-power Deep Neural Network on an FPGA based on the Nested RNS: Applied to an Object Detector , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).

[20]  Hao Yan,et al.  Efficient Allocation and Heterogeneous Composition of NVM Crossbar Arrays for Deep Learning Acceleration , 2018, 2018 IEEE 37th International Performance Computing and Communications Conference (IPCCC).

[21]  John L. Gustafson,et al.  Beating Floating Point at its Own Game: Posit Arithmetic , 2017, Supercomput. Front. Innov..

[22]  Siyuan Lu,et al.  Training Deep Neural Networks Using Posit Number System , 2019, 2019 32nd IEEE International System-on-Chip Conference (SOCC).

[23]  Philip Heng Wai Leong,et al.  FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.

[24]  Navid Khoshavi,et al.  Compression or Corruption? A Study on the Effects of Transient Faults on BNN Inference Accelerators , 2020, 2020 21st International Symposium on Quality Electronic Design (ISQED).

[25]  Tajana Simunic,et al.  RNSnet: In-Memory Neural Network Acceleration Using Residue Number System , 2018, 2018 IEEE International Conference on Rebooting Computing (ICRC).

[26]  Mohsen Imani,et al.  Accelerating Hyperdimensional Computing on FPGAs by Exploiting Computational Reuse , 2020, IEEE Transactions on Computers.

[27]  Massoud Pedram,et al.  Deploying Customized Data Representation and Approximate Computing in Machine Learning Applications , 2018, ISLPED.

[28]  John L. Gustafson,et al.  Deep Positron: A Deep Neural Network Using the Posit Number System , 2018, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[29]  Tajana Simunic,et al.  Workload-Aware Opportunistic Energy Efficiency in Multi-FPGA Platforms , 2019, 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[30]  G.C. Cardarilli,et al.  Residue Number System for Low-Power DSP Applications , 2007, 2007 Conference Record of the Forty-First Asilomar Conference on Signals, Systems and Computers.

[31]  Lee-Sup Kim,et al.  Compressing Sparse Ternary Weight Convolutional Neural Networks for Efficient Hardware Acceleration , 2019, 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[32]  Eric B. Olsen RNS Hardware Matrix Multiplier for High Precision Neural Network Acceleration: "RNS TPU" , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).