Design Methodology for Embedded Approximate Artificial Neural Networks

Artificial neural networks (ANNs) have demonstrated significant promise while implementing recognition and classification applications. The implementation of pre-trained ANNs on embedded systems requires representation of data and design parameters in low-precision fixed-point formats; which often requires retraining of the network. For such implementations, the multiply-accumulate operation is the main reason for resultant high resource and energy requirements. To address these challenges, we present Rox-ANN, a design methodology for implementing ANNs using processing elements (PEs) designed with low-precision fixed-point numbers and high performance and reduced-area approximate multipliers on FPGAs. The trained design parameters of the ANN are analyzed and clustered to optimize the total number of approximate multipliers required in the design. With our methodology, we achieve insignificant loss in application accuracy. We evaluated the design using a LeNet based implementation of the MNIST digit recognition application. The results show a 65.6%, 55.1% and 18.9% reduction in area, energy consumption and latency for a PE using 8-bit precision weights and activations and approximate arithmetic units, when compared to 16-bit full precision, accurate arithmetic PEs.

[1]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[2]  Natalie D. Enright Jerger,et al.  Reduced-Precision Strategies for Bounded Memory in Deep Neural Nets , 2015, ArXiv.

[3]  İsmail Koyuncu,et al.  Hardware design and implementation of a novel ANN-based chaotic generator in FPGA , 2016 .

[4]  Jie Xu,et al.  DeepBurning: Automatic generation of FPGA-based learning accelerators for the Neural Network family , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[5]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[6]  Yu Wang,et al.  Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Kaushik Roy,et al.  AxNN: Energy-efficient neuromorphic systems using approximate computing , 2014, 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[9]  Kyuyeon Hwang,et al.  Fixed-point feedforward deep neural network design using weights +1, 0, and −1 , 2014, 2014 IEEE Workshop on Signal Processing Systems (SiPS).

[10]  Patricio Bulic,et al.  Applicability of approximate multipliers in hardware neural networks , 2012, Neurocomputing.

[11]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[12]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Farinaz Koushanfar,et al.  Customizing Neural Networks for Efficient FPGA Implementation , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[14]  Ran El-Yaniv,et al.  Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[15]  Akash Kumar,et al.  SMApproxLib: Library of FPGA-based Approximate Multipliers , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[16]  Olivier Temam,et al.  Leveraging the error resilience of machine-learning applications for designing highly energy efficient accelerators , 2014, 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC).