Robust object tracking via multi-cue fusion

A long-term object tracking method based on calibrated binocular cameras by fusing information of the two channels and binocular geometry constraints is proposed.The stereo filter which is built based on the epipolar geometry of the binocular cameras is effective to filter out false detection proposed by pre-trained object detector on both of the two channels.Experimental results demonstrate that the proposed method can deal with occlusion, scale variation and out-of-view situation well. Object tracking is one of the fundamental problems and an active research area in computer vision during the last decade. Although a wide variety of methods have been proposed, the long-term object tracking is still a challenging problem when dealing with occlusion, out-of-view, scale and illumination variation. To address these challenges, we propose a robust visual object tracking method based on binocular vision in this paper. Our method formulates the object tracking problem in a multi-cue fusion framework which allows our system recover from tracking drift and occlusion. For each channel of the binocular cameras, the coarse object state is estimated by combining the information of motion model, detection and online tracker. Stereo filter is designed to check the object candidate consistency of the two channels. The final object state estimation is determined by fusing the two-channel information and binocular geometry constraints. Experimental results demonstrate the effectiveness of proposed method.

[1]  Guangjun Zhang,et al.  Long-term object tracking combined offline with online learning , 2016 .

[2]  Dit-Yan Yeung,et al.  Understanding and Diagnosing Visual Tracking Systems , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Michael J. Black,et al.  EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation , 1996, International Journal of Computer Vision.

[4]  Horst Bischof,et al.  Semi-supervised On-Line Boosting for Robust Tracking , 2008, ECCV.

[5]  Horst Bischof,et al.  On-line Boosting and Vision , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  David Zhang,et al.  Fast Visual Tracking via Dense Spatio-temporal Context Learning , 2014, ECCV.

[7]  Bruce A. Draper,et al.  Visual object tracking using adaptive correlation filters , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Haibin Ling,et al.  Robust visual tracking using ℓ1 minimization , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Zhenyu He,et al.  Single object tracking via robust combination of particle filter and sparse representation , 2015, Signal Process..

[11]  Liujuan Cao,et al.  Robust depth-based object tracking from a moving binocular camera , 2015, Signal Process..

[12]  Shai Avidan Ensemble Tracking , 2007, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  R. E. Kalman,et al.  A New Approach to Linear Filtering and Prediction Problems , 2002 .

[16]  Dan Roth,et al.  Learning to detect objects in images via a sparse, part-based representation , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Andrew Zisserman,et al.  Multiple view geometry in computer visiond , 2001 .

[18]  Shai Avidan,et al.  Support vector tracking , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Ioannis Kyriakides,et al.  Target tracking using adaptive compressive sensing and processing , 2016, Signal Process..

[20]  Aggelos K. Katsaggelos,et al.  Binocular video object tracking with fast disparity estimation , 2013, 2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[21]  Chenglin Wen,et al.  A Bayesian estimation for single target tracking based on state mixture models , 2012, Signal Process..

[22]  Luc Van Gool,et al.  Beyond semi-supervised tracking: Tracking should be as simple as detection, but not simpler than recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[23]  Ming-Hsuan Yang,et al.  Robust Object Tracking with Online Multiple Instance Learning , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Lei Zhang,et al.  Real-Time Compressive Tracking , 2012, ECCV.

[25]  Wei Fu,et al.  Object Tracking and Positioning Based on Stereo Vision , 2013 .

[26]  Zdenek Kalal,et al.  Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Philip H. S. Torr,et al.  Struck: Structured output tracking with kernels , 2011, ICCV.

[28]  Rui Caseiro,et al.  Exploiting the Circulant Structure of Tracking-by-Detection with Kernels , 2012, ECCV.

[29]  Seah Hock Soon,et al.  3D Human motion tracking by exemplar-based conditional particle filter , 2015, Signal Process..

[30]  Ming-Hsuan Yang,et al.  Incremental Learning for Robust Visual Tracking , 2008, International Journal of Computer Vision.

[31]  Joachim Denzler,et al.  Binocular 3-D Object Tracking with Varying Focal Lengths , 2005 .

[32]  Ming-Hsuan Yang,et al.  Visual tracking with online Multiple Instance Learning , 2009, CVPR.

[33]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[34]  Ramakant Nevatia,et al.  Robust Object Tracking by Hierarchical Association of Detection Responses , 2008, ECCV.

[35]  Ying Wu,et al.  What Are We Tracking: A Unified Approach of Tracking and Recognition , 2013, IEEE Transactions on Image Processing.

[36]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.