Visual Object Tracking Based on Cross-Modality Gaussian-Bernoulli Deep Boltzmann Machines with RGB-D Sensors

Visual object tracking technology is one of the key issues in computer vision. In this paper, we propose a visual object tracking algorithm based on cross-modality featuredeep learning using Gaussian-Bernoulli deep Boltzmann machines (DBM) with RGB-D sensors. First, a cross-modality featurelearning network based on aGaussian-Bernoulli DBM is constructed, which can extract cross-modality features of the samples in RGB-D video data. Second, the cross-modality features of the samples are input into the logistic regression classifier, andthe observation likelihood model is established according to the confidence score of the classifier. Finally, the object tracking results over RGB-D data are obtained using aBayesian maximum a posteriori (MAP) probability estimation algorithm. The experimental results show that the proposed method has strong robustness to abnormal changes (e.g., occlusion, rotation, illumination change, etc.). The algorithm can steadily track multiple targets and has higher accuracy.

[1]  Tapani Raiko,et al.  Gaussian-Bernoulli restricted Boltzmann machines and automatic feature extraction for noise robust missing data mask estimation , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Zhenzhong Wei,et al.  Real-Time Visual Tracking through Fusion Features , 2016, Sensors.

[3]  Lei Zhang,et al.  Real-Time Compressive Tracking , 2012, ECCV.

[4]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[5]  Zdenek Kalal,et al.  Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Kian-Ming Lim,et al.  Self-taught learning of a deep invariant representation for visual tracking via temporal slowness principle , 2015, Pattern Recognit..

[7]  Yi Li,et al.  DeepTrack: Learning Discriminative Feature Representations Online for Robust Visual Tracking , 2015, IEEE Transactions on Image Processing.

[8]  Junseok Kwon,et al.  Visual tracking decomposition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[10]  Huchuan Lu,et al.  Visual Tracking via Random Walks on Graph Model , 2016, IEEE Transactions on Cybernetics.

[11]  Dit-Yan Yeung,et al.  Learning a Deep Compact Image Representation for Visual Tracking , 2013, NIPS.

[12]  Matteo Munaro,et al.  Tracking people within groups with RGB-D data , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Baojun Zhao,et al.  Visual Tracking Based on Extreme Learning Machine and Sparse Representation , 2015, Sensors.

[14]  Wei Liu,et al.  Severely Blurred Object Tracking by Learning Deep Image Representations , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[15]  Geoffrey E. Hinton,et al.  OPTIMAL PERCEPTUAL INFERENCE , 1983 .

[16]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[17]  Kai Oliver Arras,et al.  People detection in RGB-D data , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Gang Wang,et al.  Multi-manifold deep metric learning for image set classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20]  Gang Wang,et al.  Video Tracking Using Learned Hierarchical Features , 2015, IEEE Transactions on Image Processing.

[21]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[22]  Hongyu Wang,et al.  Visual Object Tracking Based on 2DPCA and ML , 2013 .

[23]  John F. Doherty,et al.  Track Detection of Low Observable Targets Using a Motion Model , 2015, IEEE Access.

[24]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[25]  KuenJason,et al.  Self-taught learning of a deep invariant representation for visual tracking via temporal slowness principle , 2015 .

[26]  Jianxiong Xiao,et al.  Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines , 2013, 2013 IEEE International Conference on Computer Vision.

[27]  Simone Calderara,et al.  Transductive People Tracking in Unconstrained Surveillance , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[28]  Vibhav Vineet,et al.  Struck: Structured Output Tracking with Kernels , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Kai Oliver Arras,et al.  People tracking in RGB-D data with on-line boosted target models , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[30]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Anastasios D. Doulamis,et al.  Dynamic tracking re-adjustment: a method for automatic tracking recovery in complex visual environments , 2010, Multimedia Tools and Applications.

[32]  Ming-Hsuan Yang,et al.  Visual tracking with online Multiple Instance Learning , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.