论文信息 - A Low-Power Deep Neural Network Online Learning Processor for Real-Time Object Tracking Application

A Low-Power Deep Neural Network Online Learning Processor for Real-Time Object Tracking Application

A deep neural network (DNN) online learning processor is proposed with high throughput and low power consumption to achieve real-time object tracking in mobile devices. Four key features enable a low-power DNN online learning. First, a proposed processor is designed with a unified core architecture and it achieves $1.33\times $ higher throughput than the previous state-of-the-art DNN learning processor. Second, the new algorithms, binary feedback alignment (BFA), and dynamic fixed-point based run-length compression (RLC), are proposed and reduce power consumption through the reduction of external memory accesses (EMA). The BFA and dynamic fixed-point-based RLC reduce the EMA by 11.4% and 32.5%, respectively. Third, the new data feeding units, including an integral RLC (iRLC) decoder and a transpose RLC (tRLC) decoder, are co-designed to maximize throughput alongside the proposed algorithms. Finally, a dropout controller in this processor reduces redundant power consumption coming from the unified core and the data feeding architecture by the proposed dynamic clock-gating scheme. This enables the proposed processor to operate DNN online learning with 38.1% lower power consumption. Implemented with 65 nm CMOS technology, the 3.52 mm2 DNN online learning processor shows 126 mW power consumption and the processor achieves 30.4 frames-per-second throughput in the object tracking application.

[1] Bohyung Han,et al. BranchOut: Regularization for Online Ensemble Tracking with Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Hoi-Jun Yoo,et al. A 141.4 mW Low-Power Online Deep Neural Network Training Processor for Real-time Object Tracking in Mobile Devices , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).

[3] Marian Verhelst,et al. A 0.3–2.6 TOPS/W precision-scalable processor for real-time large-scale ConvNets , 2016, 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits).

[4] Seunghoon Hong,et al. Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network , 2015, ICML.

[5] Stan Sclaroff,et al. MEEM: Robust Tracking via Multiple Experts Using Entropy Minimization , 2014, ECCV.

[6] Bohyung Han,et al. Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Rui Caseiro,et al. High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Zhe Chen,et al. MUlti-Store Tracker (MUSTer): A cognitive psychology inspired approach to object tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Colin J. Akerman,et al. Random synaptic feedback weights support error backpropagation for deep learning , 2016, Nature Communications.

[10] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[11] Hoi-Jun Yoo,et al. A 21mW low-power recurrent neural network accelerator with quantization tables for embedded deep learning applications , 2017, 2017 IEEE Asian Solid-State Circuits Conference (A-SSCC).

[12] Hoi-Jun Yoo,et al. UNPU: A 50.6TOPS/W unified deep neural network accelerator with 1b-to-16b fully-variable weight bit-precision , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[13] Michael Felsberg,et al. Accurate Scale Estimation for Robust Visual Tracking , 2014, BMVC.

[14] Vivienne Sze,et al. 14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks , 2016, ISSCC.

[15] Saibal Mukhopadhyay,et al. A Power-Aware Digital Multilayer Perceptron Accelerator with On-Chip Training Based on Approximate Computing , 2017, IEEE Transactions on Emerging Topics in Computing.

[16] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[17] Abhinav Gupta,et al. Transferring Rich Feature Hierarchies for Robust Visual Tracking , 2015, ArXiv.

[18] Paul A. Viola,et al. Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[19] Hoi-Jun Yoo,et al. 14.2 DNPU: An 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[20] Hoi-Jun Yoo,et al. A 9.02mW CNN-stereo-based real-time 3D hand-gesture recognition processor for smart mobile devices , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[21] Tadahiro Kuroda,et al. BRein memory: A 13-layer 4.2 K neuron/0.8 M synapse binary/ternary reconfigurable in-memory deep neural network accelerator in 65 nm CMOS , 2017, 2017 Symposium on VLSI Circuits.

[22] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[23] Jin Young Choi,et al. Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Zdenek Kalal,et al. Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25] Guangwen Yang,et al. F-CNN: An FPGA-based framework for training Convolutional Neural Networks , 2016, 2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[26] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[27] Michael Felsberg,et al. The Visual Object Tracking VOT2013 Challenge Results , 2013, ICCV 2013.

[28] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[29] Ming-Hsuan Yang,et al. Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30] Tadahiro Kuroda,et al. QUEST: A 7.49TOPS multi-purpose log-quantized DNN inference engine stacked on 96MB 3D SRAM using inductive-coupling technology in 40nm CMOS , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[31] Konrad Schindler,et al. Online Multi-Target Tracking Using Recurrent Neural Networks , 2016, AAAI.

[32] David Blaauw,et al. 14.7 A 288µW programmable deep-learning processor with 270KB on-chip weight storage using non-uniform memory hierarchy for mobile intelligence , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[33] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[34] Mark Horowitz,et al. 1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[35] Youchang Kim,et al. 14.6 A 0.62mW ultra-low-power convolutional-neural-network face-recognition processor and a CIS integrated with always-on haar-like face detector , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).