Robust Fusion of Color and Depth Data for RGB-D Target Tracking Using Adaptive Range-Invariant Depth Models and Spatio-Temporal Consistency Constraints

This paper presents a novel robust method for single target tracking in RGB-D images, and also contributes a substantial new benchmark dataset for evaluating RGB-D trackers. While a target object’s color distribution is reasonably motion-invariant, this is not true for the target’s depth distribution, which continually varies as the target moves relative to the camera. It is therefore nontrivial to design target models which can fully exploit (potentially very rich) depth information for target tracking. For this reason, much of the previous RGB-D literature relies on color information for tracking, while exploiting depth information only for occlusion reasoning. In contrast, we propose an adaptive range-invariant target depth model, and show how both depth and color information can be fully and adaptively fused during the search for the target in each new RGB-D image. We introduce a new, hierarchical, two-layered target model (comprising local and global models) which uses spatio-temporal consistency constraints to achieve stable and robust on-the-fly target relearning. In the global layer, multiple features, derived from both color and depth data, are adaptively fused to find a candidate target region. In ambiguous frames, where one or more features disagree, this global candidate region is further decomposed into smaller local candidate regions for matching to local-layer models of small target parts. We also note that conventional use of depth data, for occlusion reasoning, can easily trigger false occlusion detections when the target moves rapidly toward the camera. To overcome this problem, we show how combining target information with contextual information enables the target’s depth constraint to be relaxed. Our adaptively relaxed depth constraints can robustly accommodate large and rapid target motion in the depth direction, while still enabling the use of depth data for highly accurate reasoning about occlusions. For evaluation, we introduce a new RGB-D benchmark dataset with per-frame annotated attributes and extensive bias analysis. Our tracker is evaluated using two different state-of-the-art methodologies, VOT and object tracking benchmark, and in both cases it significantly outperforms four other state-of-the-art RGB-D trackers from the literature.

[1]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Nan Jiang,et al.  Unifying Spatial and Attribute Selection for Distracter-Resilient Tracking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Jing Wang,et al.  Online learning 3D context for robust visual tracking , 2015, Neurocomputing.

[4]  Ales Leonardis,et al.  A Two-Stage Dynamic Model for Visual Tracking , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[5]  Ales Leonardis,et al.  Visual Object Tracking Performance Measures Revisited , 2015, IEEE Transactions on Image Processing.

[6]  Bineng Zhong,et al.  3D object tracking via image sets and depth-based occlusion detection , 2015, Signal Process..

[7]  Ales Leonardis,et al.  Single target tracking using adaptive clustered decision trees and dynamic multi-level appearance models , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Cordelia Schmid,et al.  Learning Color Names for Real-World Applications , 2009, IEEE Transactions on Image Processing.

[9]  Wei Li,et al.  Single and Multiple Object Tracking Using a Multi-Feature Joint Sparse Representation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Tianzhu Zhang,et al.  3D Part-Based Sparse Tracker with Automatic Synchronization and Registration , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Huchuan Lu,et al.  Visual tracking via adaptive structural local sparse appearance model , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Junseok Kwon,et al.  Highly Nonrigid Object Tracking via Patch-Based Dynamic Appearance Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[14]  Shin Ishii,et al.  Occlusion aware particle filter tracker to handle complex and persistent occlusions ( Supplementary Material ) , 2014 .

[15]  Richard Bowden,et al.  Exploring Causal Relationships in Visual Object Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Thomas Mauthner,et al.  In defense of color-based model-free tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Changsheng Xu,et al.  Structural Sparse Tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Rustam Stolkin,et al.  A calibration system for measuring 3D ground truth for validation and error analysis of robot vision algorithms , 2006 .

[19]  Haibin Ling,et al.  Robust Visual Tracking using 1 Minimization , 2009 .

[20]  Michael Felsberg,et al.  The Visual Object Tracking VOT2013 Challenge Results , 2013, ICCV 2013.

[21]  Ales Leonardis,et al.  Robust Visual Tracking Using an Adaptive Coupled-Layer Visual Model , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Ming-Hsuan Yang,et al.  Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Changsheng Xu,et al.  Robust Visual Tracking via Exclusive Context Modeling , 2016, IEEE Transactions on Cybernetics.

[24]  Simone Calderara,et al.  Visual Tracking: An Experimental Survey , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Abubakr Muhammad,et al.  Outdoor RGB-D SLAM Performance in Slow Mine Detection , 2012, ROBOTIK.

[26]  Liujuan Cao,et al.  Robust depth-based object tracking from a moving binocular camera , 2015, Signal Process..

[27]  Rustam Stolkin,et al.  Measuring complete ground-truth data and error estimates for real video , 2005 .

[28]  Shuicheng Yan,et al.  NUS-PRO: A New Visual Tracking Challenge , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Zhibin Hong,et al.  Dual-Force Metric Learning for Robust Distracter-Resistant Tracker , 2012, ECCV.

[31]  Rustam Stolkin,et al.  Particle Filter Tracking of Camouflaged Targets by Adaptive Fusion of Thermal and Visible Spectra Camera Data , 2014, IEEE Sensors Journal.

[32]  Michael Felsberg,et al.  Adaptive Color Attributes for Real-Time Visual Tracking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Matteo Munaro,et al.  Tracking people within groups with RGB-D data , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[34]  Majid Mirmehdi,et al.  DS-KCF: a real-time tracker for RGB-D data , 2016, Journal of Real-Time Image Processing.

[35]  Mike J. Chantler,et al.  Perceptually Motivated Image Features Using Contours , 2016, IEEE Transactions on Image Processing.

[36]  Mourad Oussalah,et al.  Continuously Adaptive Data Fusion and Model Relearning for Particle Filter Tracking With Multiple Features , 2016, IEEE Sensors Journal.

[37]  Kai Oliver Arras,et al.  People tracking in RGB-D data with on-line boosted target models , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[38]  Ales Leonardis,et al.  Distractor-Supported Single Target Tracking in Extremely Cluttered Scenes , 2016, ECCV.

[39]  Majid Mirmehdi,et al.  Real-time RGB-D Tracking with Depth Scaling Kernelised Correlation Filters and Occlusion Handling , 2015, BMVC.

[40]  Jiri Matas,et al.  A Novel Performance Evaluation Methodology for Single-Target Trackers , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Haibin Ling,et al.  Robust visual tracking using ℓ1 minimization , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[42]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Rustam Stolkin,et al.  An Adaptive Background Model for Camshift Tracking with a Moving Camera , 2006 .

[44]  Xiaoqin Zhang,et al.  Robust Visual Tracking Based on Incremental Tensor Subspace Learning , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[45]  Gang Wang,et al.  Video tracking using learned hierarchical features. , 2015, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[46]  Xiaoqin Zhang,et al.  Single and Multiple Object Tracking Using Log-Euclidean Riemannian Subspace and Block-Division Appearance Model , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Jiri Matas,et al.  Online adaptive hidden Markov model for multi-tracker fusion , 2015, Comput. Vis. Image Underst..

[48]  Jianxiong Xiao,et al.  Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines , 2013, 2013 IEEE International Conference on Computer Vision.

[49]  Alberto Del Bimbo,et al.  Object Tracking by Oversampling Local Features , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Luc Van Gool,et al.  An adaptive color-based particle filter , 2003, Image Vis. Comput..