Robust visual tracking based on convolutional neural network with extreme learning machine

Recently, deep learning has attracted substantial attention as a promising solution to many problems in computer vision. Among various deep learning architectures, convolutional neural network (CNN) has demonstrated superior performance as a feature learning method. In this paper, we present a novel hybrid model of CNN and extreme learning machine (ELM) for object tracking. Training a conventional CNN requires a substantial amount of computation and a large dataset. ELM randomly generates the parameters of hidden layers and calculates network weights between output and hidden layers via the regularized least-square method, thereby dramatically reducing the learning time while producing accurate results with minimal training data. Therefore, we integrate the ELM auto-encoder architecture into the CNN model. In addition, an effective updating scheme is designed for the model training to overcome the tracking drift problem. The joint CNN-ELM tracker is robust to object variations such as illumination, occlusion, and rotation in a video sequence. Numerous experiments on various challenging videos demonstrate that the proposed tracker performs favourably compared to several state-of-the-art methods.

[1]  Dit-Yan Yeung,et al.  Learning a Deep Compact Image Representation for Visual Tracking , 2013, NIPS.

[2]  Haibin Ling,et al.  Robust Visual Tracking and Vehicle Classification via Sparse Representation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Ido Leichter,et al.  Mean Shift Trackers with Cross-Bin Metrics , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Minho Lee,et al.  Fast learning method for convolutional neural networks using extreme learning machine and its application to lane detection , 2017, Neural Networks.

[5]  Rui Caseiro,et al.  Exploiting the Circulant Structure of Tracking-by-Detection with Kernels , 2012, ECCV.

[6]  Heng Tao Shen,et al.  Video Captioning With Attention-Based LSTM and Semantic Consistency , 2017, IEEE Transactions on Multimedia.

[7]  Xin Wang,et al.  Deep Reinforcement Learning for Visual Object Tracking in Videos , 2017, ArXiv.

[8]  Narendra Ahuja,et al.  Robust Visual Tracking via Structured Multi-Task Sparse Learning , 2012, International Journal of Computer Vision.

[9]  Guang-Bin Huang,et al.  An Insight into Extreme Learning Machines: Random Neurons, Random Features and Kernels , 2014, Cognitive Computation.

[10]  Pong C. Yuen,et al.  Robust Visual Tracking via Basis Matching , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Nicu Sebe,et al.  Deep appearance and motion learning for egocentric activity recognition , 2018, Neurocomputing.

[12]  Jin Gao,et al.  Transfer Learning Based Visual Tracking with Gaussian Processes Regression , 2014, ECCV.

[13]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Baojun Zhao,et al.  Visual Tracking Based on Extreme Learning Machine and Sparse Representation , 2015, Sensors.

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  Ming-Hsuan Yang,et al.  Incremental Learning for Robust Visual Tracking , 2008, International Journal of Computer Vision.

[17]  Xiaoshuai Sun,et al.  Two-Stream 3-D convNet Fusion for Action Recognition in Videos With Arbitrary Size and Length , 2018, IEEE Transactions on Multimedia.

[18]  Chunhua Shen,et al.  Real-time visual tracking using compressive sensing , 2011, CVPR 2011.

[19]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Yi Li,et al.  Robust Online Visual Tracking with a Single Convolutional Neural Network , 2014, ACCV.

[21]  Gian Luca Foresti,et al.  The Evolution of Neural Learning Systems: A Novel Architecture Combining the Strengths of NTs, CNNs, and ELMs , 2015, IEEE Systems, Man, and Cybernetics Magazine.

[22]  Huchuan Lu,et al.  Robust object tracking via sparsity-based collaborative model , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Horst Bischof,et al.  On-line Boosting and Vision , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[24]  Shai Avidan,et al.  Support Vector Tracking , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[25]  Kenli Li,et al.  An Ensemble CNN2ELM for Age Estimation , 2018, IEEE Transactions on Information Forensics and Security.

[26]  Ming-Hsuan Yang,et al.  Robust Object Tracking with Online Multiple Instance Learning , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Se-Young Oh,et al.  Fast training of convolutional neural network classifiers through extreme learning machines , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[28]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Yao Lu,et al.  Locality-Constrained Collaborative Model for Robust Visual Tracking , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[30]  周鑫,et al.  Tracking-learning-detection (TLD)-based video object tracking method , 2012 .

[31]  Yihong Gong,et al.  Human Tracking Using Convolutional Neural Networks , 2010, IEEE Transactions on Neural Networks.

[32]  Huihui Song Robust visual tracking via online informative feature selection , 2014 .

[33]  Michael J. Black,et al.  EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation , 1996, International Journal of Computer Vision.

[34]  Wolfgang Nejdl,et al.  Introduction to the special section on twitter and microblogging services , 2013, TIST.

[35]  Huchuan Lu,et al.  Robust Object Tracking via Sparse Collaborative Appearance Model , 2014, IEEE Transactions on Image Processing.

[36]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[37]  Yaonan Wang,et al.  Bidirectional Extreme Learning Machine for Regression Problem and Its Learning Effectiveness , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[38]  Rama Chellappa,et al.  Visual tracking and recognition using appearance-adaptive models in particle filters , 2004, IEEE Transactions on Image Processing.

[39]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Dong Yi,et al.  Robust Online Learned Spatio-Temporal Context Model for Visual Tracking , 2014, IEEE Transactions on Image Processing.

[41]  Shengping Zhang,et al.  Sparse coding based visual tracking: Review and experimental comparison , 2013, Pattern Recognit..

[42]  Shuicheng Yan,et al.  Robust Object Tracking with Online Multi-lifespan Dictionary Learning , 2013, 2013 IEEE International Conference on Computer Vision.

[43]  Meng Wang,et al.  Self-Supervised Video Hashing With Hierarchical Binary Auto-Encoder , 2018, IEEE Transactions on Image Processing.

[44]  Zhongfei Zhang,et al.  A survey of appearance models in visual object tracking , 2013, ACM Trans. Intell. Syst. Technol..

[45]  Shai Avidan,et al.  Ensemble Tracking , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Dorin Comaniciu,et al.  Kernel-Based Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  Tobias Bjerregaard,et al.  A survey of research and practices of Network-on-chip , 2006, CSUR.

[48]  Nicu Sebe,et al.  Optimized Graph Learning Using Partial Tags and Multiple Features for Image and Video Annotation , 2016, IEEE Transactions on Image Processing.

[49]  Guang-Bin Huang,et al.  Trends in extreme learning machines: A review , 2015, Neural Networks.

[50]  Gang Wang,et al.  Video tracking using learned hierarchical features. , 2015, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[51]  Anton van den Hengel,et al.  Fast Global Kernel Density Mode Seeking: Applications to Localization and Tracking , 2007, IEEE Transactions on Image Processing.

[52]  Lei Zhang,et al.  Real-Time Compressive Tracking , 2012, ECCV.

[53]  Jiri Matas,et al.  P-N learning: Bootstrapping binary classifiers by structural constraints , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[54]  Lei Xie,et al.  An ensemble of deep neural networks for object tracking , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[55]  Wenzhong Guo,et al.  Land-Use Classification via Extreme Learning Classifier Based on Deep Convolutional Features , 2017, IEEE Geoscience and Remote Sensing Letters.