A Low-Power Deep Neural Network Online Learning Processor for Real-Time Object Tracking Application

A deep neural network (DNN) online learning processor is proposed with high throughput and low power consumption to achieve real-time object tracking in mobile devices. Four key features enable a low-power DNN online learning. First, a proposed processor is designed with a unified core architecture and it achieves $1.33\times $ higher throughput than the previous state-of-the-art DNN learning processor. Second, the new algorithms, binary feedback alignment (BFA), and dynamic fixed-point based run-length compression (RLC), are proposed and reduce power consumption through the reduction of external memory accesses (EMA). The BFA and dynamic fixed-point-based RLC reduce the EMA by 11.4% and 32.5%, respectively. Third, the new data feeding units, including an integral RLC (iRLC) decoder and a transpose RLC (tRLC) decoder, are co-designed to maximize throughput alongside the proposed algorithms. Finally, a dropout controller in this processor reduces redundant power consumption coming from the unified core and the data feeding architecture by the proposed dynamic clock-gating scheme. This enables the proposed processor to operate DNN online learning with 38.1% lower power consumption. Implemented with 65 nm CMOS technology, the 3.52 mm2 DNN online learning processor shows 126 mW power consumption and the processor achieves 30.4 frames-per-second throughput in the object tracking application.

[1]  Bohyung Han,et al.  BranchOut: Regularization for Online Ensemble Tracking with Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Hoi-Jun Yoo,et al.  A 141.4 mW Low-Power Online Deep Neural Network Training Processor for Real-time Object Tracking in Mobile Devices , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).

[3]  Marian Verhelst,et al.  A 0.3–2.6 TOPS/W precision-scalable processor for real-time large-scale ConvNets , 2016, 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits).

[4]  Seunghoon Hong,et al.  Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network , 2015, ICML.

[5]  Stan Sclaroff,et al.  MEEM: Robust Tracking via Multiple Experts Using Entropy Minimization , 2014, ECCV.

[6]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Zhe Chen,et al.  MUlti-Store Tracker (MUSTer): A cognitive psychology inspired approach to object tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Colin J. Akerman,et al.  Random synaptic feedback weights support error backpropagation for deep learning , 2016, Nature Communications.

[10]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[11]  Hoi-Jun Yoo,et al.  A 21mW low-power recurrent neural network accelerator with quantization tables for embedded deep learning applications , 2017, 2017 IEEE Asian Solid-State Circuits Conference (A-SSCC).

[12]  Hoi-Jun Yoo,et al.  UNPU: A 50.6TOPS/W unified deep neural network accelerator with 1b-to-16b fully-variable weight bit-precision , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[13]  Michael Felsberg,et al.  Accurate Scale Estimation for Robust Visual Tracking , 2014, BMVC.

[14]  Vivienne Sze,et al.  14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks , 2016, ISSCC.

[15]  Saibal Mukhopadhyay,et al.  A Power-Aware Digital Multilayer Perceptron Accelerator with On-Chip Training Based on Approximate Computing , 2017, IEEE Transactions on Emerging Topics in Computing.

[16]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[17]  Abhinav Gupta,et al.  Transferring Rich Feature Hierarchies for Robust Visual Tracking , 2015, ArXiv.

[18]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[19]  Hoi-Jun Yoo,et al.  14.2 DNPU: An 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[20]  Hoi-Jun Yoo,et al.  A 9.02mW CNN-stereo-based real-time 3D hand-gesture recognition processor for smart mobile devices , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[21]  Tadahiro Kuroda,et al.  BRein memory: A 13-layer 4.2 K neuron/0.8 M synapse binary/ternary reconfigurable in-memory deep neural network accelerator in 65 nm CMOS , 2017, 2017 Symposium on VLSI Circuits.

[22]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[23]  Jin Young Choi,et al.  Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Zdenek Kalal,et al.  Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Guangwen Yang,et al.  F-CNN: An FPGA-based framework for training Convolutional Neural Networks , 2016, 2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[26]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[27]  Michael Felsberg,et al.  The Visual Object Tracking VOT2013 Challenge Results , 2013, ICCV 2013.

[28]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[29]  Ming-Hsuan Yang,et al.  Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Tadahiro Kuroda,et al.  QUEST: A 7.49TOPS multi-purpose log-quantized DNN inference engine stacked on 96MB 3D SRAM using inductive-coupling technology in 40nm CMOS , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[31]  Konrad Schindler,et al.  Online Multi-Target Tracking Using Recurrent Neural Networks , 2016, AAAI.

[32]  David Blaauw,et al.  14.7 A 288µW programmable deep-learning processor with 270KB on-chip weight storage using non-uniform memory hierarchy for mobile intelligence , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[33]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[34]  Mark Horowitz,et al.  1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[35]  Youchang Kim,et al.  14.6 A 0.62mW ultra-low-power convolutional-neural-network face-recognition processor and a CIS integrated with always-on haar-like face detector , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).