MiniTracker: A Lightweight CNN-based System for Visual Object Tracking on Embedded Device

Visual object tracking (VOT) is a computer vision application and has a wide range of use. However, related state of the art algorithms using deep learning methods, are computationally intensive and storage explosive. Whats more, despite many deep learning accelerators have been proposed, many of them are general structure. So, in this paper, we propose a lightweight CNN-based system–-MiniTracker, integration of algorithm and hardware–-particularly efficient for VOT. Because of the fully-convolutional Siamese network we used, the parameters of network do not need online training, which reduces computation consumptions dramatically. We adapt the original Siamese network (SN) into effective hardware implementation by parameter pruning and quantization. Then a lightweight CNN with the 8-bit parameters is produced, which is only 1.939MB. The real tracking rate is 18.6 frames per second at the cost of 1.284W on ZedBoard. Moreover, Compared with other hardware implementations, our system is robust to challenging scenarios, such as occlusions, changing appearance, illumination variations and etc.

[1]  Thomas Mauthner,et al.  In defense of color-based model-free tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  He Chen,et al.  Moving object detection and tracking based on ZYNQ FPGA and ARM SOC , 2015 .

[3]  Serag El-Din Habib,et al.  A Survey on Hardware Implementations of Visual Object Trackers , 2017, IET Image Process..

[4]  Arnold W. M. Smeulders,et al.  Fast occluded object tracking by a robust appearance filter , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Abhinav Gupta,et al.  Transferring Rich Feature Hierarchies for Robust Visual Tracking , 2015, ArXiv.

[7]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[8]  Michael Felsberg,et al.  Convolutional Features for Correlation Filter Based Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[9]  Uwe D. Hanebeck,et al.  Template matching using fast normalized cross correlation , 2001, SPIE Defense + Commercial Sensing.

[10]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[13]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[14]  Bin Yu,et al.  Structural Compression of Convolutional Neural Networks Based on Greedy Filter Pruning , 2017, ArXiv.

[15]  Dit-Yan Yeung,et al.  Learning a Deep Compact Image Representation for Visual Tracking , 2013, NIPS.

[16]  Manoj Pandey,et al.  Computational Acceleration of Real-Time Kernel-Based Tracking System , 2016, J. Circuits Syst. Comput..

[17]  Wen-Chung Kao,et al.  Object tracking based on hardware/software co-design of particle filter and particle swarm optimization , 2014, 2014 IEEE Fourth International Conference on Consumer Electronics Berlin (ICCE-Berlin).

[18]  Cordelia Schmid,et al.  Online Object Tracking with Proposal Selection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Wonyong Sung,et al.  Simulation-based word-length optimization method for fixed-point digital signal processing systems , 1995, IEEE Trans. Signal Process..

[20]  Simon Baker,et al.  Lucas-Kanade 20 Years On: A Unifying Framework , 2004, International Journal of Computer Vision.

[21]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[22]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[23]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[24]  Michael Felsberg,et al.  The Visual Object Tracking VOT2015 Challenge Results , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[25]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Jenq-Neng Hwang,et al.  Finite Precision Error Analysis of Neural Network Hardware Implementations , 1993, IEEE Trans. Computers.

[27]  Tetsuya Yagi,et al.  Real-time object tracking based on scale-invariant features employing bio-inspired hardware , 2016, Neural Networks.

[28]  Shuicheng Yan,et al.  NUS-PRO: A New Visual Tracking Challenge , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Ming-Hsuan Yang,et al.  Hierarchical Convolutional Features for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Ming-Hsuan Yang,et al.  Incremental Learning for Robust Visual Tracking , 2008, International Journal of Computer Vision.

[31]  Hongdong Li,et al.  Tracking Randomly Moving Objects on Edge Box Proposals , 2015, ArXiv.