Accelerating the Inference Phase in Ternary Convolutional Neural Networks Using Configurable Processors

The need to automate tasks has led to state-of-the-art algorithms such as Convolutional Neural Networks (CNNs), which can learn representations from images. CNNs are computationally expensive. Hence, they are mostly implemented on Graphical Processing Units (GPU), which consume high power, a large area and have high cost. Due to these factors, it is difficult to implement CNNs in embedded systems where there are severe space and power constraints. Application Specific Instruction-Set Processors (ASIPs) provide an alternative to GPUs and are low power consuming devices. However, floating-point operations and storage are expensive and do not lend themselves well to an ASIP implementation. A more efficient approach to implement CNNs on ASIPs involves expressing the data and parameters with binary, ternary or fixed-point representations. In this paper, we discuss our implementation of the inference phase of ternary CNNs on an Xtensa LX6 ASIP and measure its performance for the MNIST dataset. The complex computations are parallelized with custom instructions developed using the TIE programming language. Compared to a sequential implementation, our results indicate a speed up of 4.7× with an added area of only 20%.

[1]  Yen-Cheng Kuan,et al.  A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things , 2017, IEEE Transactions on Circuits and Systems I: Regular Papers.

[2]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[4]  Haider A. Alwzwazy,et al.  Robust Convolutional Neural Networks for Image Recognition , 2015 .

[5]  Huawei Li,et al.  A Case of On-Chip Memory Subsystem Design for Low-Power CNN Accelerators , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[6]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[7]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[8]  Tsutomu Sasao,et al.  A memory-based realization of a binarized deep convolutional neural network , 2016, 2016 International Conference on Field-Programmable Technology (FPT).

[9]  Lei Deng,et al.  Gated XNOR Networks: Deep Neural Networks with Ternary Weights and Activations under a Unified Discretization Framework , 2017, ArXiv.

[10]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  H. T. Kung,et al.  Embedded Binarized Neural Networks , 2017, EWSN.

[12]  Philip Heng Wai Leong,et al.  FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.

[13]  Chris Rowen,et al.  Using Convolutional Neural Networks for Image Recognition By , 2015 .

[14]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[15]  Igor Carron,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016 .

[16]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[17]  Holger Blume,et al.  Instruction-set extension for an ASIP-based SIFT feature extraction , 2014, 2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV).

[18]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[19]  Hsinchun Chen,et al.  Machine Learning for Information Retrieval: Neural Networks, Symbolic Learning, and Genetic Algorithms , 1995, J. Am. Soc. Inf. Sci..

[20]  Rajesh Gupta,et al.  Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs , 2017, FPGA.